Having layered out the process and results for batters, we can now turn our attention to pitchers.
Now, the news is not going to be great. It'll be good, in that this is an advance. But any expectation that we'd uncover anything big would have been highly misguided.
First, we start with our two baselines. First, how well does wOBAcon (year T) predict wOBAcon (year T+1)? The year to year correlation (minimum 100 batted balls, average of 264) is an r=.08. This sets the ballast (regression toward the mean amount) to a whoppping 3000 batted balls. In other words, once a pitcher accumulates 3000 batted balls (which is 5 or 6 full seasons for a starting pitcher), his observed wOBAcon represents about half the pitcher talent and half Random Variation. This is what we are up against when it comes to batted balls. This is why DIPS exists, and this is why FIP (which is a measure that ignores most batted balls, only keeping HR) has staying power.
How does xwOBAcon do, which you can already see on the Savant player pages and in Search? Remember that xwOBAcon is an estimate of the PLAY, looking at the combination of launch angle and speed, with combination being the key word. In that case, the r=.11, which is a slight gain over just using wOBAcon. The ballast is 2000 batted balls. That's actually pretty good, as now we're down to needing 3 or 4 full seasons for a starting pitcher to find half of their true talent.
Let's now go to each layer, starting with layer of Launch Speed. Now, remember, we are not looking at Launch Speed itself, but the translation of Launch Speed into wOBA. It is not 1:1. Low launch speeds, those under 88 mph, are all equivalent. And launch speeds over 100 mph get progressively more impactful (up to a point). All that work behind the scenes gives us a wOBA_layer based strictly on Launch Speed. And what is the year to year correlation of Lauch Speed Layer (year T) to wOBAcon (year T+1)? That is ALSO an r=.11. That's the same as good ole xwOBAcon. Remember, xwOBAcon uses both launch speed and launch angle, in unison, to describe the PLAY. And it obviously does that better than the Launch Speed Layer on its own. But to describe the PLAYER? Well, as we can see by the results, the Launch Angle is just about entirely Random Variation (when used in combination as xwOBAcon does).
When we use ALL the layers, our correlation jumps to r=.16, and our ballast drops all the way down to 1400 PA, meaning we need fewer than 3 full seasons for a starting pitcher to find half their talent. The half-full glass view is that we've drastically improved from needing 5 to 6 seasons to now only needing half as many. The half-empty view is that we still need almost 3 seasons, when we'd really like to have only 1.
Let's go through it layer by layer. Layer for Launch Angle is half the impact of that of Launch Speed, and is basically what causes it to go from r=.11 to r=.16. The difference between the layered approach and the xwOBAcon combo approach is that we are better able to isolate launch angle from launch speed.
As for the other layers, all their p-values are quite high, making them all pretty meaningless. But let's go through them anyway. The batter run speed actually has a negative correlation, but with a p-value of .36, it really could just as well be 0. The fielding aligment has a p-value of .48, and a slightly positive correlation. So again here, the fielding alignment doesn't really carry over in predictability. The spray angle has a slightly negative correlation and an even higher p-value (.62). The Fielder Performance layer is slightly positive, but at a p-value of .93, well, we can easily ignore it. And the Carry Layer at a p-value of .94, and a coefficient of almost 0, it's as meaningless as it gets.
So there you have it, after all is said and done, the two things we care about, Launch Angle and Speed are the two things already in xwOBAcon since its inception. The paradigm shift is to layer them to better isolate them, rather than combine them from the outset. And it gives us a half-glass impact in evaluating pitchers.
(In re-reading this, I have alot of ALL CAPS. I'm not shouting, just emphasizing. I can edit it to be small case bold if this bothers anyone.)
As I'm getting near the end of preparing Layered Hit and HR Probability, let me now turn my attention to Layered wOBA (or more specifically because it's on Batted Balls or Contacts, it's actually wOBAcon).
Layered wOBAcon requires the probability of each 1B, 2B, 3B, HR for each layer.
In order to have a baseline, let's just look at how wOBAcon and xwOBAcon correlate with next season's wOBAcon. Among batters with 150 PA in back to back seasons, for the 2021-24 seasons, the correlation of wOBAcon (year T) to wOBAcon (year T+1) is r=.49. With an average sample of 330 PA, we also learn that you want to add ~330 PA of league average wOBAcon (that's the prior) to the current year wOBAcon (that's the observed) to estimate next year's wOBAcon (that's the posterior). Remember that stats class where for 13 weeks they went thru the horribly named Beta Distribution with the even more horribly named alpha and beta parameters? Yeah, this is what they were talking about.
As for xwOBAcon (year T) to wOBAcon (year T+1), that's an r=.60. So, add yet another +1 in the win column for x-stats better describing the talent of players than their actual stats do. Whether W/L v ERA correlating to next year's W/L, or ERA v FIP correlating to next year's ERA, or wOBAcon v xwOBAcon correlating to next year's wOBAcon, it's all part of the same pattern: the observed stats are filled with tons of noise (Random Variation) that it hides the actual thing it is purportedly trying to measure.
Alright, so let's get back to Layered wOBAcon. The first layer we have is Launch Speed. How does Layered Speed (year T) correlate to wOBAcon (year T+1). That's an r=.60! Whoah, that's the SAME as xwOBAcon? What's going on here?
Welcome to my world, where for the last 8 years I've been discussing and describing and otherwise deliberating the PLAY v the PLAYER. When Statcast came out there was this enormous rush to taking the specifics of a play (launch speed and launch angle, notably) and using that information to purportedly describing what the player did, but was in fact simply describing the PLAY. This should have been plainly obvious when it came down to looking at 70-80mph batted balls, but launched just high enough that you'd get a high hit probability: those balls would land over the infielder and in front of the outfielder. Outside of maybe Ichiro and Arraez, NOBODY intends to do that. Every single batter is trying to hit the ball hard, at least 90mph. And so, batted balls hit at under 80mph are undoubtedbly mistakes. There are of course mistakes that lead to good outcomes and mistakes that lead to bad outcomes. But from the perspective of the talent of the player, these are better bundled together as launch speed mistakes.
Similarly, you have what scouts call Major League Outs: these are batted balls that are hit 100+ mph, but at such a high launch angle (45+ degrees) that it ends up being a very high fly out. These are better addressed as launch angle mistakes. It takes TREMENDOUS power to mishit a ball to get a 45 degree launch angle and still hit the ball 100+. If you have a batter already at the major league level, these launch angle mistakes are far easier to overcome than launch speed mistakes.
What happens with the x-stats that bundle things together, like xwOBAcon does, is that it is only focused on the PLAY. And so, xwOBAcon looks at the outcome of that combo of speed+angle, and based on the historical outcome of that combination decides how good a hit that was. Doing that removes the individuality of each of the speed and angle.
In other words, this combination approach is actually adding what is analogous to Random Variation in trying to describe the player by essentially overfitting on the play. From the perspective of the play, it's not an overfit. From the perpective of the player, it IS an overfit. We need a paradigm shift here.
This is where a Layered approach comes in. First, we focus on the primary thing that will describe the PLAYER (launch speed) and then we do our best to describe the PLAY. And incredibly, we ALREADY achieve an r=.60 doing only that.
The next layer we add is Launch Angle. Doing that gives us a small boost to r=.64. Adding the Launch Angle as a layer in this manner now allows us to better describe THE PLAYER. Sure, we lose some value in describing THE PLAY, but that's a small (temporary) loss.
From here on out, we can add each layer, one at a time (Carry, Spray Angle, Batter Running Speed, Fielding Alignment, Fielder Performance) so that we can TOTALLY describe the PLAY. You see in this paradigm shift, by accounting for every variable, we will get an r=1 in terms of describing the hit or out. And by leaving it as Layers, we can then decide which are actually the ones we care about in describing the PLAYER.
The most impactful is the launch speed, as we already presumed and surmised. The batter's running speed is also important: this is really a trait ingrained to the player. See, this is what we are after here, to establish a tool or trait for each player. Launch Speed is a powerful trait (a combination of Bat Speed and Quality of Contact), and at the major league level, we've already selected for players to have decent Quality of Contact. Running speed is a natural trait as well.
The next on the list is Launch Angle, at about 20% the weight of Launch Speed.
The Carry layer is impactful, but in a negative sense. While we can describe the individual plays by how much Carry the ball has (whether it's by the spin imparted, or the wind or the specific traits of that particular snowflake of a ball), these actually are not helping in describing the batter.
The Spray Layer has almost no weight at all. With a p-value of 0.51, it becomes an easy feature to ignore. Yes, you need it to describe THE PLAY. But when it comes to describing the effectiveness of the player, we don't need the Spray Layer. Yes, it becomes useful to describe the PROFILE of the player (pull, spray, etc), but not their overall performance.
The Fielding Alignment Layer has no weight at all in describing the player.
So, there you have it, the three critical components are Launch Speed, Launch Angle, and Running Speed. Exactly what we already have in the x-stats. Except re-arranged and approached in a different way to better describe the player than the x-stats. And dependent on the remaining layers (Carry, Spray, Fielding Alignment, Fielder Performance) to better describe the play than the x-stats.
I talked alot about this two years ago. To catch everyone up, the idea is that a hit (or out) happened. And our job is simply to explain how that hit (or out) happened. It's not 35% of a hit or 72% of a hit or even 1% or 99%. It's exactly 100% (if it was a hit) or 0% (if it was an out). And so, what makes up that 100% (or 0%)?
We have making contact. Just making contact will lead to a hit about 33% of the time.
We have the launch speed. That can bring that number as high as 80% with maximum speed or as low as 20% with minimum speed. In other words, the Layer of launch speed will be somewhere between -13% and +47% (for an average of 0% for an average batter).
How about launch angle? Well, depending on the launch speed, that can explain as high as 100% (think easy HR) or as low as 0% (think easy popup). So, the Layer of launch angle can be between -70% and +80% (for again an average of 0%).
What else can explain the hit or out? We have the carry of the ball (meaning the spin and/or wind and/or specific ball characteristics... remember every ball is as unique as a snowflake, even if it comes from the same batch).
We have the spray angle (meaning finding those gaps between fielders or clearing the fence).
We have the running speed of the batter to beat out grounders.
We have the fielding alignment on that specific play.
We have the specific fielder involved on that specific play (did they make a great play, or a terrible play, or something in-between).
Other than the unrecorded events like lights, sun, and catwalks, all of these variables comprise the totality of what can happen to a batted ball that will turn it into a basehit or an out.
And so, that's what we do. We go thru each one of the 120,000 batted balls, and make sure that every layer is given a value so that the sum total of those layers is exactly 100% (for a basehit) or 0% (for a batted ball out). In other words, we will exactly describe each play.
WHY
Why do we do that? Well, now that we have each component perfectly described retroactively, we can use that information to ALSO predict future batted balls. Since some components are more in the control of the batter, then those components will carry more weight for predictions.
Let's take an obvious one. Pete Crow-Armstrong makes a highlight play, earning +0.90 OAA (outs above average). If PCA is earning +0.90 OAA, then guess what, the batter is going to get -90% for the Fielder Layer in Layered Hit Probability. Remember, we are trying to explain the result (an out in this case) and so how do we get to 0% hits? Well, the batter may have gotten great contact at a great launch angle and he was probably sitting at +90% in layers, but the fielding layer was worth -90% and so that's 0% hits. That describes that play. But what about the PLAYER? Well, in that case, we likely put most of our weight on his launch speed and launch angle.
WEIGHTS
How much weight? And how about all the other layers? Well, I'm glad you asked.
Running a correlation of current season layers to next season BACON (batting average on contact), we get an r=0.56. In contrast, the traditional xBA gives us a correlation of r=0.45. So, right away, we know we've got something value-added here.
Let's look at it layer by layer. The first most obvious one is launch speed. At a p value of 0.0000000 (make that 39 zeroes), it's clear the Launch Speed Layer is critical. It's weight is 0.64. I don't think we need to belabour the value of launch speed.
Launch Angle Layer is the next most important (also a p-value of 0, but this time to 7 zeroes). Its weight is 0.28.
Batter Running Speed Layer, with a p-value of 0, and a full weight of 1.00. This one makes the most sense: the batter's running speed IS the actual description of the batter. This is unlike launch speed and launch angle which is a product of his skill, and not the skill itself. It's pretty close naturally. But it is not exact. That's why launch speed has a weight of 0.64 and not 1, and why launch angle is 0.28 and not 1.
That said, on a seasonal basis, batter by batter, the Launch Speed layer is between -9% and +14%. Taking 0.64 of that, and we can say that the Launch Speed Skill will range from -6% to +9%.
Launch Angle Layer, observed at +/-12% would establish a Launch Angle skill of -3% to +3%.
Running Speed Layer goes from -1% to +2%.
I hope this is making sense. Anyway, let's keep going. The next important layer, far behind these, is Fielding Alignment Layer, with a p-value of 0.09. Now, that is a high number, high enough that we might even suggest that Fielding Alignment has no predictive value. Its average coefficient is 0.11, but the range of possible coefficient is -.02 to +.25. It's enough for us to simply not use it at all for predictive purposes. To the extent that you want to use it: the observed range is +/-6%, and so the Fielding Alignment Skill is +/-0.6%.
After that, it's the Spray Angle Layer, at a p value of 0.22. This one is even more clear that it adds almost no value. The weight is 0.10, but it can possibly range from -0.06 to +0.25. When you see a p value that high, you are basically going to say Random Variation. The observed range is +/-5%, which means the Spray Angle Skill is +/-0.5%. But like I said, it may as well be 0.
Finally, the Carry Layer, at a p value of 0.48 is really screaming Random Variation. And with a coefficient close to 0, it's really not worth even discussing it. The observed range is +/-5%, but the skill range is 0%. In other words, this Layer as well as the Fielding Layer, is perfectly apropos for describing the PLAY and totally unusable in describing the PLAYER.
To summarize the weights and skill ranges of each:
Weight, Skill Layer, Component
0.64, -6% to +9%, Launch Speed
0.28, -3% to +3%, Launch Angle
1.00, -1% to +2%, Running Speed
0.11, -0.6% to +0.6%, Fielding Alignment
0.10, -0.5% to +0.5%, Spray Angle
0.00, 0%, Carry
0.00, 0%, Fielder
All in all, what are we most interested in, with a hit probability metric? Launch Speed, Launch Angle, and Running Speed. Which is how xBA currently exists.
However, breaking it up into components allows us to weight each of the three separately. And so, that's how we can bring our correlation from r=0.45 using the amalgamated xBA to r=0.56 using a Layered approach.
And yes, the individual layers will be available, batter by batter on a seasonal basis, on Savant at some point. I don't know when.
FIP: pitcher descriptive metric; describes (part of) pitcher's season
ERA: pitcher+fielding+timing descriptive metric; describes (all of) pitcher's season, which is obfuscated by also describing team defense + random variation
Of the two, FIP better describes a pitcher's season
Will get to the headline question in a second. First the setup.
Suppose you have two teams, the Expos who are favoured to beat the Spiders, in 52% of their matchups, on neutral sites.
When playing at the homesite of the Expos, they win 56% of the time. When they play as visitors against the Spiders, they win only 48% of the time. This is a typical home site advantage in baseball.
In a winner-take-all game, at the Expos home site, the Expos have a 56% win expectancy.
In a best-of-7 game series, where the Expos are home for 4 of the 7 games, the Expos have a win expectancy of 55.6%
That's right: you may think that there is more randomness in a single game, but that is not true. The Expos have a better chance of winning a one-game series than a best-of-7 series.
ANSWER TO THE QUESTION
Indeed, in a best-of-3, where all the games are at the home site of the Expos, the Expos have a win expectancy of 59.0%.
Do you know what kind of best-of-X series you need, with an even split of homesite games (except for the last game naturally)? Would you believe... 27? That's right, a best-of-27 series, with 14 games at the Expos home site and 13 games at the Spiders home site has the same win expectation as a best-of-3 all-Expos-home games.
Next time someone complains that a 3-game series is too short, or too random, remind them of the above fact.
Tarik Skubal got 228 outs on strikeouts, facing 753 batters. Since the league average strikeout rate is 22.6%, then the league average pitcher would get 170 strikeouts on 753 batters. Skubal therefore got 58 more strikeouts than the league average.
The run value of an out, strikeout or otherwise, in 2024 is roughly -.264 runs. In other words, the run potential is reduced by .264 runs for every strikeout.
Since Skubal is +58 on strikeouts, and the run value is .264 runs for each strikeout, then Skubal saved about 15.4 runs on strikeouts.
Here's how Skubal looks for every event, where plus is good for the pitcher, and minus is bad:
+15.4 SO
+ 7.9 BB+HBP
+10.5 HR
+ 5.0 2B+3B
+ 2.9 1B+ROE
- 2.7 Fielded Outs
The total of all that is +39 runs, which leads MLB in 2024.
Notice also that I broke the events into two: the top set are what is called the Three True Outcomes, (TTO, or the unfieldable balls), while the bottom set are balls in park (BIP, or the fieldable balls).
Skubal is +34 in TTO and +5 in BIP. In other words, most of Skubal's value derives from Skubal himself without relying on his fielders.
The top pitcher in TTO in 2024 is Chris Sale at +40 runs, while he was -7 runs in BIP. That's right, Sale got below league average results on fieldable balls. Whether that is directly a result of Sale being a poor pitcher, or a bad job by the fielders, or a bad job by the team in setting up the fielding alignment, well, that's a different discussion. The key is to separate them in this manner, so we can focus on the one aspect, TTO, most in Sale's control, while acknowledging the other aspect, BIP.
Paul Skenes was fourth in TTO: +25 runs. He was +3 in BIP. Crochet third at +25 in TTO and -10 in BIP.
As you can see, our top 4 in TTO totalled +123 runs, while they were -9 runs in BIP. In other words: they got fantastic results when not relying on their fielders, and got below league average results when relying on their fielders.
How about pitchers who got fantastic results with their fielders? Bryce Miller looks like this:
+3.3 SO
+5.4 BB+HBP
+0.0 HR
+ 5.6 2B+3B
+ 9.8 1B+ROE
+ 9.3 Fielded Outs
All in all: +9 runs on TTO and +25 runs on BIP. Overall, that's +34 runs. Because he got most of his value on BIP, the understanding is that some of that value is not about the pitcher themselves, but rather his fielders or his team. Officially, he carries the stat line. In reality, he's the figurehead for everything that the pitcher and his fielders do.
So, what does this presentation buy us? Well, if it's not apparent, the TTO is essentially the same thing as FIP. Whereas FIP is set to the ERA scale, this presentation focuses on runs. Now, you may think: well, isn't ERA runs? Yes. But, for some reason the FIP-naysayers focus on that scaling to ERA to claim FIP is not a stat that describes what actually happened.
And in the case of runs on TTO, it's very clearly about describing what really happened. We take a direct path from event to runs.
What also happens by focusing on runs and breaking it into components is that now everything adds up very clearly and cleanly. It opens the door to seeing how players do year by year, and so we can see how much run value a pitcher gets on his TTO and his BIP. Indeed, we can see the run value at each event (SO, BB, HR, 2B, 3B, field outs, etc).
Anyway, this is how run values, wOBA, and FIP all tie together. They are all basically talking about the same thing, but just focus on different parts of the game or uses different scales.
As we know, when Corbin Burnes had his historical FIP season and won in 2021, that was a paradigm shift. From 2006-2020, the voters voted very consistently. And now, they are in a transition period. As a result, I have two Predictors, one is the Classic that works for 2006-2020, and another the New FIP-enhanced version that works almost as good as the Classic for 2021-2023. The Classic is probably a smidge ahead still, and it's a matter of time until the New version takes over. When that happens, I don't know. So, let's run both, with the Classic listed first, and the New in parens.
1. Skubal (1)
... way way way ahead
2A. Lugo (2)
2B. Burnes (5)
...
4. Valdez (6)
5A. Ragans (3)
5B. Blanco
5C. Gilbert (4)
5D. Miller
Anyone within ~1 point I put with the "letter" designation, signifying essentially a tie, and likely needing the New version as the tie-breaker
Even so, I consider FOUR points as being essentially tied, and so that's where the tiebreaker comes in. In the above, that means Valdez is really tied with the gang listed at #5
So, what do we learn here? Well, Skubal will win the Cy Young, easily. Number 2 will be Seth Lugo.
The uncertainty will be between Burnes who is NO LONGER the FIP-hero and Ragans. Burnes is ahead of Ragans by almost 6 points using the Classic Predictor, while Ragans is ahead of Burnes by almost 5 points using the New Predictor.
If we treat it as 3/4 Classic, 1/4 New, we get this as our top 6:
Skubal
Lugo
Burnes
Ragans
Valdez
Gilbert
Going 1/4 Classic, 3/4 New:
Skubal
Lugo
Ragans
Burnes
Gilbert
Valdez
And in all that will be Clase, who will finish somewhere between third and seventh. To finish second, he'd have to be considered the equal to Lugo, Burnes, Ragans. Given how little support the best relievers have received since Britton's incredible run in 2016, it'll be surprising if Clase is listed on all 30 ballots. As Britton was listed on 24/30, that's probably what Clase has as an over/under.
***
1A. Sale (1 runaway)
1B. Wheeler (2)
3. Skenes (3)
4A. Imanaga (9)
4B. King (6)
6. Lopez (7)
7. Cease (5)
..
10. Webb (4)
As close as Wheeler made it in the end, Sale's tripe-crown and runaway lead in the New Predictor will be an easy win for him. Whether it's unanimous is the only question.
Skenes will be third.
So, here's where the uncertainty happens, in the downballot. Cease/Webb are the FIP-hero, while Imanaga/King are the Classic hero.
Here's how it looks 3/4 Classic, 1/4 New:
Sale
Wheeler
Skenes
King
Imanaga
Lopez
Cease
Webb
Going 1/4 Classic, 3/4 New:
Sale
Wheeler
Skenes
Cease
Webb
King
Lopez
Imanaga
In order to see where we are in the paradigm shift, just look to see where Cease/Webb finish relative to Imanaga/King. Imanaga/King are ahead of Cease by 5 points and Webb by 10 points with the Classic Predictor. With the New Predictor, Webb is ahead of all of them, but especially with Imanaga by over 10 points. So, Webb/Imanaga especially will be the tell.
If you see a ballot that looks something like this:
Sale, Wheeler, Skenes, Webb, Imanaga
or
Sale, Wheeler, Skenes, Imanaga, Webb
Then you will see this makes no sense, as the voter has basically decided to not decide on their view. They've basically taken the position that they have no position and are still trying to balance everything out. A vote for Webb is a vote for Cease. And a vote for Imanaga is a vote for King. To choose one from each group is the reason we are still in this paradigm shift.
On the left image, the compass-like layout is showing the arm angle of every pitcher, with Zack Wheeler highlighted. The average angle for all of Wheeler's pitches is 27 degrees. The arm angle is measured in two dimensions, based on when the ball is released, and relative to the shoulder at that point in time.
The high arm slot pitchers are over 60 degrees, for both RHP (oRange circles) and LHP (purpLe circles). Pure sidearmers are at 0 degrees either way.
Small note: for LHP, you will see I put in parentheses an angle following the Cartesian plane standard. So, 0 degrees for a LHP is shown also as 180 degrees, while 60 degrees for a LHP is shown also as 120 degrees. This will become clear in a moment why I did that.
MIDDLE IMAGE
In the middle chart, we see the movement of all of Wheeler's pitches. Wheeler throws his sinker with an arm angle of 24 degrees (somewhat similar to the 27 degrees he throws all his pitches). That dark gray line represents his arm angle of 24 degrees.
Also on that chart is a dark orange line that goes to the center of his sinker movement chart: that represents the angle of movement of his sinker, which is 21 degrees.
As you can see, the angle of movement of his sinker is somewhat close to the angle of his arm. Logically, there would have to be some kind of relationship between how you throw your pitches and how a ball moves. The arm angle is just one factor. How the ball rolls off the fingers would be another, and this would be most obvious with sliders. And another one: by manipulating the orientation of the seams, you can trigger the airflow around the ball to push the ball in a certain direction more than it would otherwise move (aka Seam-Shifted Wake or SSW).
TOP RIGHT
The top right chart plots all of the release angles you see from the left chart (using the Cartesian standard) on the x-axis, along with the movement angle (that middle chart, but for the sinkers of all pitchers) on the y-axis. As you can see, there is a strong 1:1 relationship between arm angle and movement angle (for sinkers). Some of the pitchers buck the trend, like Tyler Rogers (that bottom left circle), with an arm angle of minus 65 degrees, but a movement angle of minus 84 degrees, for a 19 degree deviation.
BOTTOM RIGHT
The bottom right chart keeps the x-axis of the top chart and shows the y-axis as the movement angle minus the arm angle. In other words, how much deviation is there in the sinker movement, relative to the arm angle. Here, it becomes a bit clearer that there is additional movement arm-side.
For RHP, the average arm angle is 34 degrees, while the movement angle is 28.5 degrees, so there's an average of 5.5 degrees of deviation, an extra 5.5 degrees of drop (or sink, hence the term sinker).
For LHP, the average arm angle is 33 degrees (or 147 in Cartesian), with an extra 5.6 degrees of sink.
And this is how arm angle and movement angle relate to each other, for sinkers.
***
I actually had wanted to start this for 4-seam fastballs, but there were a few pitchers that were way off. In looking at those pitchers, it became clear the reason: they were likely throwing cutters, not 4-seam fastballs. While we investigate those pitchers, I turned my attention to sinkers to better illustrate the concept.
***
Fans of Matt Lentzner may remember this article from 15 years ago at Hardball Times, as a precursor to his Pitching Peanut (slideshow or powerpoint).
It is very (very very) simple to figure out Runs Above Average (RAA) for a pitcher. I'll use Paul Skenes as the example.
Take the league average ERA (4.086) and subtract our pitcher's ERA (1.992). That makes Skenes 2.094 runs per 9 IP better than league average.
Since Skenes has 131 IP, we take the above number (2.094/9) and multiply by 131 to give us +30.5 runs above average.
That's it. That is Runs Above Average using ERA-only. That figure for Skenes is 4th highest in MLB, behind Sale (+34 runs), Skubal and Wheeler (+33).
Now, you may be asking: what about park factors? Baseball Reference has Skenes as pitching in slightly batter's parks. So, that simple league average of 4.086 is actually too simple, since that figure is the same for all pitchers. We know that can't possibly be true. Skenes also faces tougher competition than average. Skenes supposedly has weaker fielding support than others. When you make all these adjustments, Skenes actually ends up being +41.5 runs above average. Remember, unadjusted he was at +30.5 runs above average. So, the adjustments gives him an extra +11 runs. That's right, his 1.99 ERA is actually NOT giving him enough credit.
Since Baseball Reference is terrific in how they share their data, it's really quite simple to compare the ERA-only RAA to the fully-adjusted RAA they provide.
On this chart (click to embiggen), on the x-axis is the ERA-only RAA. If you don't want anything adjusted and you just want to rely on ERA, then just look at those numbers.
The y-axis is the bonus (or deduction) you have to apply to your pitcher to account for the context that they end up pitching in. Skenes for example is in the right corner, at 30; 11. That means his ERA-only RAA is +30 runs, and he has a +11 run bonus for his context. So, he's worth +41 RAA.
Some pitchers get FAR more bonus than that. Hunter Greene gets +19 runs of bonus for his context. That means his ERA is really clouded, practically Coors-like in its effect. So, he's +20 runs for his ERA-only and another +19 runs for the context, for a total of +39 RAA.
Erick Fedde is +13 for his ERA-only, and another +17 for his context, giving him +30 runs above average.
We can compare Cy Young candidates Cole Ragans (+17, +10) to Logan Gilbert (+18, -10). You see, both are very similar based on their ERA. But according to Reference, Ragans faced a tough context, while Gilbert had a pretty easy context. That's a 20 run gap between the two in terms of their context. So Ragans ends up being +27 while Gilbert is only +8. In other words, instead of Ragans being 1 run behind Gilbert, he's 19 runs ahead, all because of the 20 run difference in their context.
Now, there's no question that if you are a Mariners fan, you will disagree, and a Royals fan is quite happy. That's unfortunately how these contexts gets interpreted: how does it affect MY player.
Chris Flexen is one of the worst pitchers in baseball using ERA, at -18 runs. But Reference says he also had one of the toughest pitching environments to the tune of +17 runs. So overall he ends up being practically league average at -1 runs from average.
Did Chris Bassitt have an ordinary season (-1 RAA)? Or did he have one of the easiest contexts in all of baseball (-15 runs) so that he actually had a disastrous season (-16 RAA leading to -0.1 WAR)?
By ERA, Bassitt is 17 runs better than Flexen. By fully-adjusted Reference method, Flexen is 15 runs better than Bassitt. One had an average season, one had a disastrous season. And which pitcher had which is based on whether to fully trust ERA or to fully accept the adjustments.
Reference lays it all out there for you so you can see what they are doing. You either buy it or you don't. But the transparency is something to be commended.
I looked at the Cy Young voting for 2018-23, excluding 2020. That's 10 Cy Young winners.
There were 100 names listed on those ballots (an average of 10 pitchers per Cy-season), with 70 unique pitchers. Gerrit Cole was listed all 5 times. Three-timers: Burnes, deGrom, Verlander, Gausman, Scherzer.
Let's talk about relief pitchers. There were 10 of them, with Edwin Diaz the only relief pitcher to appear in two different seasons (2018, Mariners; 2022, Mets).
The best showing by a relief pitcher was in 2018 with Blake Treinen, who appeared on 8 of 30 ballots. This is the BEST showing for a relief pitcher over these 10 Cy Young seasons.
There were 300 ballots cast in these seasons. Not a single one had a relief pitcher get a single first place vote. And given Sale/Skubal in 2024, that's going to continue.
Only 1 of the 300 ballots had a relief pitcher appearing in 2nd place (Diaz, 2022).
Only 4 of the 300 ballots with a reliever in 3rd place (Liam Hendricks with 3, and Treinen with 1).
Only 5 of the 300 ballots with a reliever in 4th place (3 Treinen, 1 Hader in 2018, 1 Yates in 2019).
Finally, 19 of the 300 ballots had the reliever in 5th place.
And this is where we are with relievers: 29 of the 1500 slots on the 300 ballots had a relief pitcher named! Five of the ten relievers who got Cy Young votes only got votes as a token 5th place.
So where does this leave Clase in 2024? Well, he won't get any 1st place votes. As for 2nd thru 5th, he's competing with: Lugo, Burnes, Ragans, Valdez. Given how relievers have been treated, it would be a huge win if Clase appears on half the ballots. I suspect he'll top off with at most 5 votes for 2nd place, and at most 10 votes for 2nd+3rd. Meanwhile, one of the remaining starters will likely get at least 15, if not 20 votes for 2nd+3rd place. Clase just won't be able to compete with that.
In the end, Clase will likely finish at best 3rd, and at worst 6th place. Treinen finished in 6th place while getting 8 votes (1 3rd, 3 4th, 4 5th). Clase will likely finish better than that. The last time a reliever finished better than 4th overall was, I dunno, when Eric Gagne won? Kimbrel, Kenley, Aroldis all topped off at 4th I believe. So, I'd look for Clase to finish 4th or 5th.
In trying to find players who have a particular spray angle tendency, and thereby really mess with fielding alignments, I tried a different approach: I will let the clubs tell me who has a particular spray angle tendency. How do I do that? Well, in 2021+22, clubs were totally allowed to place their fielders anywhere they wanted. So, I simply figured out how often batters were shifted. Then, in 2023+24, clubs were totally prevented from stacking fielders to one side. Therefore, if the spray tendency of the batters mattered, we would see it in the x-stats which specifically ignore the spray tendency of the batters.
I have 24,931 plate appearances from LH batters in 2023+24 who were heavily shifted (80%+ of the time) in 2021+22. Those batters had an xwOBA of .339. Remember, this is only using launch angle+speed, seasonal sprint speed (and walks and strikeouts and hit batters). It ignored the spray angle. What was their actual wOBA? .340.
How about the very opposite: LH batters who were shifted less than 20% of the time in 2021+22, how did they do in 2023+24? Actual .306, xwOBA of .308.
Here are the five groups of LHH, from least heavily shifted, to most heavily shifted, with Actual wOBA first, and xwOBA next:
.306, .308
.319, .315
.316, .320
.320, .320
.340, .339
As you can see, no pattern. This is unlike for example Sprint Speed. When we exclude seasonal Sprint Speed, and then group our players by Sprint Speed, we in fact DO see a pattern. This is called a Systematic Bias. This is why we subsequently included Sprint Speed as a parameter to counteract this bias and neutralize it.
The above by the way also applies to RHH, there was no pattern with them either:
Fielding Run Value (FRV) is what you will find on Savant (and Fangraphs) and is the metric I spearheaded.
DRS (Defensive Runs Saved) is what you will find on Fangraphs and Reference, and is the metric spearheaded by John Dewan.
Now, one way to measure a metric is to see how well it can predict the OTHER metric in the year after. I've done this for Catcher Framing for example, where we learned that the Steamer metric actually predicts next year's Savant Framing metric as good as current year's Savant does. In other words, Steamer is value-added, as it is like Savant, and more.
So, this is what I did, and given the incredible layout of Fangraphs, it took me literally under 5 minutes to run the study. I exported everyone with 600+ innings and removed catchers. I turned everyone into runs per 27 outs. I correlated year T to year T+1, matching on player and position. This left me with 635 matched players.
First, how does each correlate to itself? For FRV (that's the Statcast version), it's at r=0.60. For DRS, it's r=0.50. This is a pretty good sign that FRV is better able to isolate the players (though you might argue I haven't proven that I've taken care of parks, so maybe I should look at team switchers... someone out there can pick that up).
Now, how does DRS correlate with FRV? In other words, can DRS explain FRV? That correlation is r=0.38. That's not bad. It shows that DRS sees itself and FRV different enough, though it's still able to explain a good portion of FRV.
How about FRV explaining DRS? That correlation is r=0.40. That's not bad as well. The same explanation holds, though in this case, FRV is able to explain itself to a higher degree than DRS can explain itself, all the while being able to explain DRS slightly better than DRS can explain FRV.
The knockout punch isn't there. It would have been great if FRV would have had a correlation of r=0.50 to DRS (and thereby matching the DRS correlation to itself of 0.50). That didn't happen, with an r=0.40 instead. It would have been interesting had FRV had a correlation of r=0.38 with itself as that's the knockout punch DRS would need. That obviously didn't happen, as it instead had an r=0.60.
So, there's enough here to suggest that both have value, though the value is stronger with FRV.
INFIELDERS v OUTFIELDERS
Now, the FRV method is really an Outfield and Infield method, two separate methods. I suspect that DRS likely has two somewhat distinct methods. So, let's repeat all that, but look at infielders-only and outfielders-only.
With the Infield, DRS correlates with itself at r=0.49, while FRV is at r=0.46. Slight advantage to DRS for self-correlating better. DRS correlates with next season's FRV at r=0.30, while FRV correlated with next season's DRS at r=0.28. Overall, it certainly looks like DRS has a slight advantage. I'd probably call it 55/45 for DRS here. If you want to call it 60/40 in favor of DRS, ok, I won't argue. DRS has two things going for it, one is the DP handling and the other is the little things, the nuances, of playing the infield (like relays, and other subjective calls).
How about the outfield? Well, get ready to get your mind blown here. FRV correlates with itself at r=0.73, while DRS self-correlates at r=0.51. I mean, this is just no contest at all.
But, it's not just that. I'll give you the knockout punch as well. FRV correlates with next-season's DRS better then DRS correlates with itself next-season: r=0.53 to r=0.51. Let that sink in for a bit. FRV knows nothing about DRS, knows nothing about how DRS measures things. And yet, it can predict next season's DRS better than DRS can (for outfielders).
Indeed, DRS can predict next season's FRV almost as well as it can predict itself: r=0.51 for self-correlation and r=0.49 for correlating FRV.
Why does this happen? Because the starting point of the outfielder and how much distance they have to cover is critical, and Statcast can precisely measure this.
In terms of weighting, I'd have to go at least 90/10 for FRV, if not 100/0.
It's clear that in the off-season, my time should be spent much more with infielders, and handing all those extras that I've been putting off. DRS deserves its flowers there.
Cleveland was ahead by 1 run in the bottom of the 5th, 1 out, runners on the corners. Chance of winning is .776
Runner on 1B attempted to steal 2B.
Choices for Rays was:
Let runner steal uncontested, keep runner at bay, leaving runners at 2B+3B, chance of winning .795
Throw to 2B, allow runner to score
CS means .795
SB means .833
As you can see, the win value of the runner at 3B gaining a base is exactly equal to the win value of the runner on 1B being thrown out. Which makes sense, since the run value of going from 3B to home plate is about plus 0.4 runs, and the run value of the CS is about minus 0.4 runs.
Of course, this ONLY makes sense if the CS was a guaranteed out. Otherwise, having the runner steal 2B and the runner scoring would be a disaster play.
End result: catcher should NOT have attempted to throw the runner out, and he got lucky to breakeven on that play.
Logan Webb is 2nd in MLB in IP with 189.2, with a not great, but good ERA of 3.46
Paul Skenes (120 IP) is 2nd in MLB in ERA (min 80 IP), with a not great, but fantastic ERA of 2.10
Skenes has been charged with 28 ER and Webb 73. The difference between the two is 45 ER and 69.2 IP, which is an ERA of 5.81.
Of the 141 pitchers with at least 80 IP, the bottom 10% have an average ERA of 5.83. This is illustratively, if not exactly, what we mean when we talk about Replacement Level (or Readily Available Talent Level) in WAR. In other words, what is the minimal level of performance at which you can play in MLB and provide the minimal level of value where you would earn the minimum salary.
All those innings that Skenes didn't pitch, and that provided no value is essentially as if he had thrown 69.2 IP and allowed 45 ER. That is the no-value level of performance. It neither adds, nor subtracts, from his overall value. We added 0-value to his outstanding value of 2.10 ERA in 120 by adding 45 ER and 69.2 IP.
And when we do that, when we add no value of 45 ER and 69.2 IP, we end up with Logan Webb, and his 3.46 ERA in 189.2 IP.
In other words, Paul Skenes and Logan Webb generated the same amount of value, even if they got there in very different ways.
The framework of WAR is perfect. Indeed, I've adapted the concept of WAR in baseball to create frameworks for hockey, basketball, and volleyball. Doing the same for football and futbol and cricket and any other sport is easily solvable. Since I first developed WAR about twenty years ago, and with the benefit of hindsight, there is nothing that I would change about its framework.
I would tweak its presentation, but back then, I was speaking to a small group of folks, and those are just surface-level details. Had I known it would take off the way it did, I would have had a better presentation, notably the Individualized Won-Loss Records (aka The Indis).
So, framework is perfect. Presentation can be improved. What about the implementation?
The implementation is what you see on Fangraphs and Baseball Reference: they take the framework and then actually build something with it. The framework is the design, almost blueprint. But to actually build WAR, well, there are things not noted in the blueprints, like nails and screws and age of the wood and types of pipes and lighting fixtures. Those are implementation details.
In other words, there are alot of choices you have to make, big and small, in order to take a blueprint and turn it into a house.
This is easiest shown by looking at the differences in the two main implementations of WAR. On Fangraphs, their pitching WAR is centered around FIP. On Reference, the starting point is Runs Allowed (RA/9). The framework of WAR doesn't insist on anything in this regard. It is a feature, not a bug, that allows two frameworks to exist as they do with the big difference in choices made.
There are smaller choices as well. Park factors are encourage and considered in the framework of WAR. But how to actually calculate park factors? That's an implementation choice. Again, feature, not bug.
Should you consider performance with runners on base? Sure, the framework of WAR allows for it. The two main implemetations, Fangraphs and Reference, are of similar mindset when it comes to batters in this regard. But nothing is stopping anyone (other than time) from making a different choice. You could use RE24 (run expectancy by the 24 base-out states). You could use WPA (win probability added). You could decide that you are not really sure about any of these, but want to give some consideration to each of them: so, you could, for example, give 10% weight to WPA and 40% weight to RE24 and 50% weight to wOBA. You can literally make any small choice you want.
The implication of these choices will compound when you start to focus on individual players. See, for the majority of the players, whatever choices you make, it's going to cancel out. You make a dozen different choices in your implementation of WAR, and it'll help Aaron Judge a bit in seven of them, and hurt him in five. Or it'll help Bobby Witt Jr in four of them and hurt him in eight. You can modify your choices so that it helps Witt more than Judge. And every now and then, some of your choices will overwhelmingly favour one or three players. Naturally, you aren't building your WAR metric to want to do that. But, it'll happen. Again, feature, not bug. That's because these choices are opinions. Sure, they are fact-based opinions, but still opinions.
What the WAR framework does is insist on a systematic, unbiased, consistent process, rather than an arbitrary, biased, and capricious whim. Your personal WAR, whatever it is, is the latter. The WAR framework simply forces your opinion to follow a process.
As those who follow me know, the Cy Young Predictor has worked spectacularly well. Until Colin Burnes won it in his FIP year. I created a FIP-enhanced version as well, given that we may be in a paradigm shift.
Chris Sale (and Tarik Skubal) are running away with the predictor using the FIP-enhanced version. Skubal is ALSO running away with it with the classic predictor. So, we won't learn anything there.
However, Sale is barely holding back Wheeler with the Classic Predictor. This means that Wheeler has a chance for an upset here... as long as there are enough old-school voters whose behaviour is being captured by the Classic Predictor.
How many of the 30 voters are Classic voters? I don't know, but let's say that there are 20 Classic voters and 10 FIP-enhanced voters. This means that Wheeler is already 0-10, and he needs to perform well enough over his next 5 starts (and/or Sale pitch poorly enough) that Wheeler can get 16 of the 20 Classic voters. Wheeler and Sale are going to get all 1st and 2nd place votes, regardless of mindset.
In order for Wheeler to get 16 of 20 votes, he probably has to lead with the Classic Predictor by about 5 points. Right now, Sale is ahead in the Classic scoring by 1.4 points. So over the next 5 starts (assuming they each get 5 more starts), Wheeler needs to get about 6 or 7 more points than Sale.
How doable is that?
Sale is averaging 13.5 points per 5 starts with a standard deviation of 4.9 points per 5 starts. Wheeler is 12.25 points and 6.3, respectively. In terms of the difference of two distributions, the standard deviations is the RSS, or one standard deviation is 8 points.
With Wheeler 1.4 points behind already, and 1.25 points expected behind over the next 5 starts, he's 2.65 points behind and he needs to be about 5 points ahead, or a swing of almost 8 points.
In other words: one standard deviation. Which will happen about 16% of the time.
Of course, all this is pretty rough, and if you want to say 10% or 15% or 20% or 25%, that's fine. I can't really give you that precision.
I can tell you the current market is at 82% for Sale and 18% for Wheeler. So, it seems that the market is basically in line with the Predictor.
Fedde is at 3.9 wins above average (WAA), which is the same as the eventual NL Cy Young winner Chris Sale, and 0.1 wins below the eventual AL Cy Young winner Tarik Skubal. Hunter Greene leads at 4.3.
His WAR also follows similarly: 5.5 for Greene, 5.4 Skubal, 5.2 Fedde and Sale.
Fangraphs has Fedde at 2.9 WAR, tied for 24th, with the eventual Cy Young winners as 1-2: Sale 5.7 and Skubal 4.8.
So, what is going on here, how does Reference have Fedde squeezed in between Sale and Skubal?
For that, we have to give thanks to Sean Forman and his team for being ridiculously transparent about it all. Not only do they give you the step by step explainer for WAR, but then they present it component by component so we can understand what is going on.
The first thing to know is that Reference doesn't care about SO and BB and HBP and HR. What they principally care about is Runs Allowed. Not ERA, but RA/9.
Let's compare Fedde to Sale directly. Fedde has 1 more IP than Sale, while giving up 15 more runs (and 15 more ER for that matter). Right off the bat, we start with Fedde behind 15 runs behind Sale.
So how does he make up that difference? That 1 more IP gives him a 0.5 run advantage.
The first thing that jumps out here is the fielding support: Sale is being charged with 0.11 runs per 9 IP of fielding support, while Fedde is supposedly hurt with -0.41 r/9 of fielding support. That is a gap of 0.52 runs per 9 IP. And since they've each pitched the equivalent of 17 9-inning games, then 17 x .52 = 9 runs.
Is it possible for two pitchers to have a gap of 9 runs in fielding support? I actually track that right here:
Eovaldi and Bassitt have benefitted from 9 runs of fielding support above average when he was on the mound (that last part is key). Stroman and Spence have been hurt by 9 runs. So, comparing these pitchers specifically and we have an 18 run gap, which is huge. Therefore, a 9 run gap between two pitchers, while noteworthy, is reasonable.
The gap between Fedde and Sale however is only 4 runs... and it is FEDDE that has been getting better fielding support.
See, the difference in the two approaches is that on Savant we track the fielding support while that pitcher is on the mound. On Reference, it is a team-level adjustment. So, regardless of how the Braves fielders did with Sale on the mound, what matters is whatthe Braves fielders did for ALL their pitchers. Then that is proportioned out to each Braves pitcher. This is akin to a great hitting team counting as the same offensive support, even if in games pitched by one pitcher they only scored 3 runs per game and they scored 6 runs for another pitcher. When you make an overall team-level adjustment, ALL the pitchers are treated with the same run support. And that's what's happening here with the fielding support.
Indeed, Fedde has a .263 BABIP, while Sale is at .317. While not dispositive, it certainly argues in favor that Fedde has not been hurt by his fielders, while Sale has been. Which is what the Savant play-by-play evaluation supports (not to THIS extent, but to some extent).
Anyway, let's keep going.
Fedde is treated as pitching more in batter's parks, while Sale is neutral. I won't look into it some more, but let's assume this is accurate. The net impact is about 3% of runs, and so that's about 2 runs.
Fedde also faced tougher competition. Again, let's assume this is accurate. Reference shows an advantage of 0.13 runs per game, which works out to another 2 runs.
Let's add it up:
0.5 runs: IP advantage to Fedde
9 runs: fielding support to Fedde
2 runs: park support to Fedde
2 runs: opponent quality to Fedde
Add it up and it's 13.5 runs. That's close enough to the 15 runs that we've pretty much explained why Reference loves Fedde.
But, that fielding support number is what is carrying all the weight here. As I said, it should go 4 runs the other way. And once you do that, then all those components end up cancelling out down to .... 0 runs.
And we are left with Sale being 15 runs ahead of Fedde.
In order to buy into the Reference WAR, you have to buy into two things:
1. The overall fielding evaluations at the team level is correct
2. The partitioning of these evaluations at the pitcher level is fair
Unfortunately, there is no uncertainty level in these adjustments. And so, you end up with isolated issues like Fedde v Sale every year.
As a result, single-season WAR may be 90% reliable, but you have some one-offs like these that are off-putting.
That said, things like this work themselves out over a period of years to the point that being off by 1 or 2 wins here or there might be bothersome at the seasonal level, it ends up not really mattering at the career level.
I should also mention that I love Reference, it is an indispensible site for both me and the industry.
I introduced Leverage Index about twenty years ago. One of the early things I did with it back then, which has not really been followed-thru by anyone, is Re-Leveraging the data. I will explain what that means, using Aaron Judge as the example.
Leverage Index (LI) is simply a measure of how much impact that particular moment has on the game, in real-time. The average moment is 1.000. The highest leveraged moment (think bottom of the 9th, bases loaded, down by a run or two) will be around 10. Naturally, you can have an LI approach 0 in a blowout.
The top ace reliever will average an LI of 2.0, basically saying that the moment they come into the game has twice the impact as a random moment in a game.
AARON JUDGE
Aaron Judge, because he plays for the Yankees, and because games seem to be decided one way or the other earlier than normal, has an LI of only 0.9. That's not a reflection of HIM, but rather his circumstances. Right away, we can see that whatever he does, on average, it will be depressed by 10%. We'll take care of that in a moment.
The most crucial moment that Aaron Judge hit a HR is with an LI of almost 4, which is quite high. He has three more HR with an LI of around 2. Another 13 HR with an LI above 1. Another 13 with an LI above 0.5 Then 21 more HR with an LI of under 0.5. The average LI of when he hits a HR is only 0.78. This is much lower than his average circumstance of an LI of 0.9. When folks say that Judge hits alot of useless HR, this is what they are actually saying. How many useless HR is he hitting? I'll get back to that in a moment.
He has 31 doubles and triples. The average LI of those is 0.77, pretty much the same as his HR. This is not looking good for Judge. So far, his extra base hits are coming in substantially lower-leverage situations, even accounting for his overall low-LI to begin with.
His singles have an LI of 0.91, which is the almost the same as his overall average LI. His unintentional walks and HBP are at 0.84. His outs are also at an LI of 0.91.
Ok, so we have our evidence that Judge is actually not rising to the occasion. How can we measure that?
RE-LEVERAGING
When Judge hit that high-LI HR, the one with the LI of almost 4, that in essence meant that this plate appearance will swing the outcome of the game 4X as much as a random plate appearance. In other words, it's practically as if he had a 4-PA game in one PA. And so when he hit the HR in this situation, it is essentially as if he went 4-4 with 4 HR. And that's what we'll do: we will leverage this single PA and single HR as a 4PA event, counting it as 4 HR.
Of course, when he hits a HR in a 0.01 LI circumstance, that will count as 0.01 PA and 0.01 HR.
When we apply this to all his plate appearances, we end up with 491 plate appearances (instead of his actual 561, sans IBB). In order to properly re-leverage, we will bump up all his leveraged-stats by ~10%, so that we end up with 561 re-leveraged PA.
And when we do that, what happens? His actual 51 HR are re-leveraged as 45.4 HR. In other words, he loses 5.6 HR. And so we can say 5.6 of his HR are useless.
His 31 2B+3B become 27 when re-leveraged. So he loses 4 more extrabase hits. He gains 4 singles, loses 3 walks+HBP. And gets an extra 11 outs.
In the end, his actual wOBA of .497 ends up being re-leveraged as .467. This is a 30 point drop in wOBA, which we can easily convert to runs: divide by 1.2 and multiply by his PA of 561 to give us a loss of 14 runs.
IMPACT
In other words, whatever context-neutral value you may have as his run production, you need to drop it by 14 runs in order to properly account for the game situation. These are 14 runs that Aaron Judge did contribute to, but that the Yankees did not benefit from. So, when you translate his performance into wins, via WAR, you can consider removing 1.4 wins from his total. It all depends on whether you think it matters if his performance impacts a game in real-time or whether the circumstances are irrelevant. If the impact matters, then remove 1.4 wins. If the circumstances are irrelevant, then keep those rose-colored glasses on, I don't want to keep you from enjoying your own reality.
I will say this: the choice usually depends on how it affects your player. Had his re-leveraged performance would have gained him 1.4 wins, I am sure his legion of fans would accept the premise of Re-Leveraging.
A walk is as good as a hit, is essentially a true statement when the bases are empty. Which has been true for most of baseball history (with the exception being the extra inning placed runner, the XIPR).
In a Markov chain, the presumption is how you entered a state is immaterial. Being in a state is the information you need in order to know what's to come. So, if you have a runner on 1B with 0 outs, does it matter HOW you got there? If it doesn't, then that's your Markov state: runner on 1B, 0 outs. If it DOES matter, then your Markov state has to include how you go there, so that your actual Markov state is 1B-or-BB-or-HBP-or-Err, and the runner on 1B and 0 outs.
In an award-winning presentation at SABR52, Bailey Hall tackled that issue. The main overall point is that the number of runs that followed the runner on 1B, 0 outs state was essentially the same, regardless as to how the state was entered (0.94 to 0.93 runs following a leadoff BB or single respectively). But, Bailey did note that there may be a pitcher-by-pitcher effect, that maybe some pitchers are more affected by one or the other, and maybe even at the inning-level.
Most important to all this is that the question was asked, a solution has been offered, and the presentation is beyond outstanding (with pure baseball themes wherever you look). This is what an #AspiringSaberist should do: ask the question, roll up their sleeves, and show off the work. Because others will be watching, and they will remember any good work.
Recent comments
Older comments
Page 1 of 150 pages 1 2 3 > Last ›Complete Archive – By Category
Complete Archive – By Date
FORUM TOPICS
Jul 12 15:22 MarcelsApr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref
Apr 12 09:43 What if baseball was like survivor? You are eliminated ...
Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method)
Jul 13 10:20 How to watch great past games without spoilers