Having layered out the process and results for batters, we can now turn our attention to pitchers.
Now, the news is not going to be great. It'll be good, in that this is an advance. But any expectation that we'd uncover anything big would have been highly misguided.
First, we start with our two baselines. First, how well does wOBAcon (year T) predict wOBAcon (year T+1)? The year to year correlation (minimum 100 batted balls, average of 264) is an r=.08. This sets the ballast (regression toward the mean amount) to a whoppping 3000 batted balls. In other words, once a pitcher accumulates 3000 batted balls (which is 5 or 6 full seasons for a starting pitcher), his observed wOBAcon represents about half the pitcher talent and half Random Variation. This is what we are up against when it comes to batted balls. This is why DIPS exists, and this is why FIP (which is a measure that ignores most batted balls, only keeping HR) has staying power.
How does xwOBAcon do, which you can already see on the Savant player pages and in Search? Remember that xwOBAcon is an estimate of the PLAY, looking at the combination of launch angle and speed, with combination being the key word. In that case, the r=.11, which is a slight gain over just using wOBAcon. The ballast is 2000 batted balls. That's actually pretty good, as now we're down to needing 3 or 4 full seasons for a starting pitcher to find half of their true talent.
Let's now go to each layer, starting with layer of Launch Speed. Now, remember, we are not looking at Launch Speed itself, but the translation of Launch Speed into wOBA. It is not 1:1. Low launch speeds, those under 88 mph, are all equivalent. And launch speeds over 100 mph get progressively more impactful (up to a point). All that work behind the scenes gives us a wOBA_layer based strictly on Launch Speed. And what is the year to year correlation of Lauch Speed Layer (year T) to wOBAcon (year T+1)? That is ALSO an r=.11. That's the same as good ole xwOBAcon. Remember, xwOBAcon uses both launch speed and launch angle, in unison, to describe the PLAY. And it obviously does that better than the Launch Speed Layer on its own. But to describe the PLAYER? Well, as we can see by the results, the Launch Angle is just about entirely Random Variation (when used in combination as xwOBAcon does).
When we use ALL the layers, our correlation jumps to r=.16, and our ballast drops all the way down to 1400 PA, meaning we need fewer than 3 full seasons for a starting pitcher to find half their talent. The half-full glass view is that we've drastically improved from needing 5 to 6 seasons to now only needing half as many. The half-empty view is that we still need almost 3 seasons, when we'd really like to have only 1.
Let's go through it layer by layer. Layer for Launch Angle is half the impact of that of Launch Speed, and is basically what causes it to go from r=.11 to r=.16. The difference between the layered approach and the xwOBAcon combo approach is that we are better able to isolate launch angle from launch speed.
As for the other layers, all their p-values are quite high, making them all pretty meaningless. But let's go through them anyway. The batter run speed actually has a negative correlation, but with a p-value of .36, it really could just as well be 0. The fielding aligment has a p-value of .48, and a slightly positive correlation. So again here, the fielding alignment doesn't really carry over in predictability. The spray angle has a slightly negative correlation and an even higher p-value (.62). The Fielder Performance layer is slightly positive, but at a p-value of .93, well, we can easily ignore it. And the Carry Layer at a p-value of .94, and a coefficient of almost 0, it's as meaningless as it gets.
So there you have it, after all is said and done, the two things we care about, Launch Angle and Speed are the two things already in xwOBAcon since its inception. The paradigm shift is to layer them to better isolate them, rather than combine them from the outset. And it gives us a half-glass impact in evaluating pitchers.
(In re-reading this, I have alot of ALL CAPS. I'm not shouting, just emphasizing. I can edit it to be small case bold if this bothers anyone.)
As I'm getting near the end of preparing Layered Hit and HR Probability, let me now turn my attention to Layered wOBA (or more specifically because it's on Batted Balls or Contacts, it's actually wOBAcon).
Layered wOBAcon requires the probability of each 1B, 2B, 3B, HR for each layer.
In order to have a baseline, let's just look at how wOBAcon and xwOBAcon correlate with next season's wOBAcon. Among batters with 150 PA in back to back seasons, for the 2021-24 seasons, the correlation of wOBAcon (year T) to wOBAcon (year T+1) is r=.49. With an average sample of 330 PA, we also learn that you want to add ~330 PA of league average wOBAcon (that's the prior) to the current year wOBAcon (that's the observed) to estimate next year's wOBAcon (that's the posterior). Remember that stats class where for 13 weeks they went thru the horribly named Beta Distribution with the even more horribly named alpha and beta parameters? Yeah, this is what they were talking about.
As for xwOBAcon (year T) to wOBAcon (year T+1), that's an r=.60. So, add yet another +1 in the win column for x-stats better describing the talent of players than their actual stats do. Whether W/L v ERA correlating to next year's W/L, or ERA v FIP correlating to next year's ERA, or wOBAcon v xwOBAcon correlating to next year's wOBAcon, it's all part of the same pattern: the observed stats are filled with tons of noise (Random Variation) that it hides the actual thing it is purportedly trying to measure.
Alright, so let's get back to Layered wOBAcon. The first layer we have is Launch Speed. How does Layered Speed (year T) correlate to wOBAcon (year T+1). That's an r=.60! Whoah, that's the SAME as xwOBAcon? What's going on here?
Welcome to my world, where for the last 8 years I've been discussing and describing and otherwise deliberating the PLAY v the PLAYER. When Statcast came out there was this enormous rush to taking the specifics of a play (launch speed and launch angle, notably) and using that information to purportedly describing what the player did, but was in fact simply describing the PLAY. This should have been plainly obvious when it came down to looking at 70-80mph batted balls, but launched just high enough that you'd get a high hit probability: those balls would land over the infielder and in front of the outfielder. Outside of maybe Ichiro and Arraez, NOBODY intends to do that. Every single batter is trying to hit the ball hard, at least 90mph. And so, batted balls hit at under 80mph are undoubtedbly mistakes. There are of course mistakes that lead to good outcomes and mistakes that lead to bad outcomes. But from the perspective of the talent of the player, these are better bundled together as launch speed mistakes.
Similarly, you have what scouts call Major League Outs: these are batted balls that are hit 100+ mph, but at such a high launch angle (45+ degrees) that it ends up being a very high fly out. These are better addressed as launch angle mistakes. It takes TREMENDOUS power to mishit a ball to get a 45 degree launch angle and still hit the ball 100+. If you have a batter already at the major league level, these launch angle mistakes are far easier to overcome than launch speed mistakes.
What happens with the x-stats that bundle things together, like xwOBAcon does, is that it is only focused on the PLAY. And so, xwOBAcon looks at the outcome of that combo of speed+angle, and based on the historical outcome of that combination decides how good a hit that was. Doing that removes the individuality of each of the speed and angle.
In other words, this combination approach is actually adding what is analogous to Random Variation in trying to describe the player by essentially overfitting on the play. From the perspective of the play, it's not an overfit. From the perpective of the player, it IS an overfit. We need a paradigm shift here.
This is where a Layered approach comes in. First, we focus on the primary thing that will describe the PLAYER (launch speed) and then we do our best to describe the PLAY. And incredibly, we ALREADY achieve an r=.60 doing only that.
The next layer we add is Launch Angle. Doing that gives us a small boost to r=.64. Adding the Launch Angle as a layer in this manner now allows us to better describe THE PLAYER. Sure, we lose some value in describing THE PLAY, but that's a small (temporary) loss.
From here on out, we can add each layer, one at a time (Carry, Spray Angle, Batter Running Speed, Fielding Alignment, Fielder Performance) so that we can TOTALLY describe the PLAY. You see in this paradigm shift, by accounting for every variable, we will get an r=1 in terms of describing the hit or out. And by leaving it as Layers, we can then decide which are actually the ones we care about in describing the PLAYER.
The most impactful is the launch speed, as we already presumed and surmised. The batter's running speed is also important: this is really a trait ingrained to the player. See, this is what we are after here, to establish a tool or trait for each player. Launch Speed is a powerful trait (a combination of Bat Speed and Quality of Contact), and at the major league level, we've already selected for players to have decent Quality of Contact. Running speed is a natural trait as well.
The next on the list is Launch Angle, at about 20% the weight of Launch Speed.
The Carry layer is impactful, but in a negative sense. While we can describe the individual plays by how much Carry the ball has (whether it's by the spin imparted, or the wind or the specific traits of that particular snowflake of a ball), these actually are not helping in describing the batter.
The Spray Layer has almost no weight at all. With a p-value of 0.51, it becomes an easy feature to ignore. Yes, you need it to describe THE PLAY. But when it comes to describing the effectiveness of the player, we don't need the Spray Layer. Yes, it becomes useful to describe the PROFILE of the player (pull, spray, etc), but not their overall performance.
The Fielding Alignment Layer has no weight at all in describing the player.
So, there you have it, the three critical components are Launch Speed, Launch Angle, and Running Speed. Exactly what we already have in the x-stats. Except re-arranged and approached in a different way to better describe the player than the x-stats. And dependent on the remaining layers (Carry, Spray, Fielding Alignment, Fielder Performance) to better describe the play than the x-stats.
I talked alot about this two years ago. To catch everyone up, the idea is that a hit (or out) happened. And our job is simply to explain how that hit (or out) happened. It's not 35% of a hit or 72% of a hit or even 1% or 99%. It's exactly 100% (if it was a hit) or 0% (if it was an out). And so, what makes up that 100% (or 0%)?
We have making contact. Just making contact will lead to a hit about 33% of the time.
We have the launch speed. That can bring that number as high as 80% with maximum speed or as low as 20% with minimum speed. In other words, the Layer of launch speed will be somewhere between -13% and +47% (for an average of 0% for an average batter).
How about launch angle? Well, depending on the launch speed, that can explain as high as 100% (think easy HR) or as low as 0% (think easy popup). So, the Layer of launch angle can be between -70% and +80% (for again an average of 0%).
What else can explain the hit or out? We have the carry of the ball (meaning the spin and/or wind and/or specific ball characteristics... remember every ball is as unique as a snowflake, even if it comes from the same batch).
We have the spray angle (meaning finding those gaps between fielders or clearing the fence).
We have the running speed of the batter to beat out grounders.
We have the fielding alignment on that specific play.
We have the specific fielder involved on that specific play (did they make a great play, or a terrible play, or something in-between).
Other than the unrecorded events like lights, sun, and catwalks, all of these variables comprise the totality of what can happen to a batted ball that will turn it into a basehit or an out.
And so, that's what we do. We go thru each one of the 120,000 batted balls, and make sure that every layer is given a value so that the sum total of those layers is exactly 100% (for a basehit) or 0% (for a batted ball out). In other words, we will exactly describe each play.
WHY
Why do we do that? Well, now that we have each component perfectly described retroactively, we can use that information to ALSO predict future batted balls. Since some components are more in the control of the batter, then those components will carry more weight for predictions.
Let's take an obvious one. Pete Crow-Armstrong makes a highlight play, earning +0.90 OAA (outs above average). If PCA is earning +0.90 OAA, then guess what, the batter is going to get -90% for the Fielder Layer in Layered Hit Probability. Remember, we are trying to explain the result (an out in this case) and so how do we get to 0% hits? Well, the batter may have gotten great contact at a great launch angle and he was probably sitting at +90% in layers, but the fielding layer was worth -90% and so that's 0% hits. That describes that play. But what about the PLAYER? Well, in that case, we likely put most of our weight on his launch speed and launch angle.
WEIGHTS
How much weight? And how about all the other layers? Well, I'm glad you asked.
Running a correlation of current season layers to next season BACON (batting average on contact), we get an r=0.56. In contrast, the traditional xBA gives us a correlation of r=0.45. So, right away, we know we've got something value-added here.
Let's look at it layer by layer. The first most obvious one is launch speed. At a p value of 0.0000000 (make that 39 zeroes), it's clear the Launch Speed Layer is critical. It's weight is 0.64. I don't think we need to belabour the value of launch speed.
Launch Angle Layer is the next most important (also a p-value of 0, but this time to 7 zeroes). Its weight is 0.28.
Batter Running Speed Layer, with a p-value of 0, and a full weight of 1.00. This one makes the most sense: the batter's running speed IS the actual description of the batter. This is unlike launch speed and launch angle which is a product of his skill, and not the skill itself. It's pretty close naturally. But it is not exact. That's why launch speed has a weight of 0.64 and not 1, and why launch angle is 0.28 and not 1.
That said, on a seasonal basis, batter by batter, the Launch Speed layer is between -9% and +14%. Taking 0.64 of that, and we can say that the Launch Speed Skill will range from -6% to +9%.
Launch Angle Layer, observed at +/-12% would establish a Launch Angle skill of -3% to +3%.
Running Speed Layer goes from -1% to +2%.
I hope this is making sense. Anyway, let's keep going. The next important layer, far behind these, is Fielding Alignment Layer, with a p-value of 0.09. Now, that is a high number, high enough that we might even suggest that Fielding Alignment has no predictive value. Its average coefficient is 0.11, but the range of possible coefficient is -.02 to +.25. It's enough for us to simply not use it at all for predictive purposes. To the extent that you want to use it: the observed range is +/-6%, and so the Fielding Alignment Skill is +/-0.6%.
After that, it's the Spray Angle Layer, at a p value of 0.22. This one is even more clear that it adds almost no value. The weight is 0.10, but it can possibly range from -0.06 to +0.25. When you see a p value that high, you are basically going to say Random Variation. The observed range is +/-5%, which means the Spray Angle Skill is +/-0.5%. But like I said, it may as well be 0.
Finally, the Carry Layer, at a p value of 0.48 is really screaming Random Variation. And with a coefficient close to 0, it's really not worth even discussing it. The observed range is +/-5%, but the skill range is 0%. In other words, this Layer as well as the Fielding Layer, is perfectly apropos for describing the PLAY and totally unusable in describing the PLAYER.
To summarize the weights and skill ranges of each:
Weight, Skill Layer, Component
0.64, -6% to +9%, Launch Speed
0.28, -3% to +3%, Launch Angle
1.00, -1% to +2%, Running Speed
0.11, -0.6% to +0.6%, Fielding Alignment
0.10, -0.5% to +0.5%, Spray Angle
0.00, 0%, Carry
0.00, 0%, Fielder
All in all, what are we most interested in, with a hit probability metric? Launch Speed, Launch Angle, and Running Speed. Which is how xBA currently exists.
However, breaking it up into components allows us to weight each of the three separately. And so, that's how we can bring our correlation from r=0.45 using the amalgamated xBA to r=0.56 using a Layered approach.
And yes, the individual layers will be available, batter by batter on a seasonal basis, on Savant at some point. I don't know when.
Having thoroughly refuted several times, both by myself and other independent researchers, that the spray direction is the missing ingredient in the x-stats, the question remains: what are missing ingredients?
Someone brought up the case of Isaac Paredes, who is a heavy pull batter. However, there is another attribute of Paredes: he does not hit the ball hard. Now, you may think that the x-stats ALREADY account for the exit velocity. After all, the two main ingredients is launch angle and speed. We account for the launch speed. Don't we? Well, once again, I must again talk about the difference between modeling a PLAY and modeling a PLAYER. The x-stats, traditionally, evaluate PLAYS. But, since we are interested in PLAYERS, we limit the variables so that we focus on the PLAYERS. In other words, yes, we evaluate each play, one at a time. But instead of considering AS MANY variables as we can that went into that play we consider AS FEW variables as we can that went into that play that the player themselves have a strong influence.
Launch speed is an easy one to include on an event by event level. Launch angle as well (the easiest one that separates groundballs from home runs). The Spray Direction is one that is needed on the play, but is not needed for the player (as we've learned many times). So, we ignore that one. We include the Seasonal Sprint Speed of the runner, as that's important on groundballs.
Which gets us back to Launch Speed. Remember last night, I created a profile of each batter, to establish their Spray Tendency? Well, what if we do the same thing, but with Launch Speed? That is, let's create a profile of a batter based on how hard they hit the ball.
Now, you may think: we ALREADY account for this on a play level right? Yes, we do. But, what if a 100mph batted ball by Isaac Paredes is different from a 100mph battedball by Giancarlo Stanton, even when both are hit at 20 degrees of launch? In other words, we want Launch Speed to pull double-duty: we want to know the launch speed on that play, but we also want to know the batter's seasonal launch speed.
So, do we see a bias based on a batter's seasonal launch speed? Yes. Yes, we do.
Here's what I did, so you can feel free to replicate. I'm focused on 2016-2019 years as one seaons and the 2020-present (thru June 5, 2024) years as a second season. I do this on the idea that a player has a general speed tendency that spans multiple years. This lets me increase my sample size for each season. I also make sure that a batter that hits on both sides is considered two distinct players.
The speed tendency follows the Escape Velocity method for Adjusted speed: greatest(88, h_launch_speed). For every batted ball, I take the greater of the launch speed and 88. And I average that.
Anyway, I use the same Pascal method of binning I did last night, the 10/20/40/20/10 split.
So, on to the fascinating results. For the weakest batters, the Paredes and Arraez and so on, their xwOBA was .306, while their actual wOBA was .318. That is an enormous bias of 12 wOBA points. The next weakest batters had .339 xwOBA and .345 actual wOBA for a bias of 6 points.
The strongest batters had an xwOBA of .452 and a wOBA of .442, for a 10 point shortfall. The next set of strongest batters had an xwOBA of .411 and a wOBA of .402 for a 9 point shortfall. The middle group were pretty much even.
Now, before we get TOO excited, what else could cause this? I have a few thoughts, but let me just leave this here for now.
I have to write one of these blog posts every year because folks are so disbelieving. And it's not just my research. I MUCH prefer when others do this research so that there's no conflict of interest.
I'll lay out my method, and you can feel free to reproduce it however you can. There's some data you may not necessarily have, but you'd be able to estimate it.
Anyway, here we go. Again.
First the data: 2016 to present (thru Jun 4, 2024), regular season and playoffs. Only hit-into-play. We want actual wOBA and xwOBA. Minimum 500 batted balls for each batter over the entire time period. This gives me 593 batters. Hopefully you get something pretty close to that.
Next, we create a spray tendency for each batter. In the past, I would just take all their battedballs to create their spray tendency. But, inspired by the point Ben Clemens recently made (who was studying a similar issue), this time I've focused on batted balls with a launch angle of 4 to 36 degrees, for balls hit 200+ feet. This is basically line drives and flyballs, and the type of batted balls that folks talk about when they talk about pull hitters and spray hitters.
But, just for completeness, I'll also do it my usual way of looking at all batted balls to establish the spray tendency. I'll do that at the end. For now, we'll follow the Clemens-inspired approach.
For the 593 batters, I take the 10% most extreme pull hitters. There's 59 of them. Their spray tendency is a pull of 9.5 degrees. Then I take the 20% next most extreme pull hitters. That's 119 batters with a spray tendency of -7.0 degrees.
I take the 10% most extreme spray hitters. There's also 59 of them, with a spray tendency of +1.6 degrees. The next 20% are at -1.6 degrees. Finally, the middle 40%, 237 batters, have a spray tendency of -4.3 degrees.
Next, for each group, we look at their actual wOBA and their xwOBA. Now remember, the xwOBA does NOT look at a batter's spray direction, whether at a single play level, or at a player tendency level. It is simply ignored. So, if we find that there is a difference between actual wOBA and xwOBA then this is evidence that the spray variable needs to be added to the model. If they are a match, then we don't need it (or at least, we haven't found any evidence with this method that it is needed).
What do we find with the most extreme pull hitters, those at -9.5 degrees of spray? Actual wOBA of .386, xwOBA of .385. How about going the other way, the most extreme spray batters, those at +1.6 degrees of spray? They have a .362 actual wOBA... and .362 xwOBA. Identical.
How about the rest of the three bins? Bin 2 is .379 actual and .379 xwOBA. Identical. Middle bin is .370 actual and .370 xwOBA. Identical. Bin 4 is .363 actual and .365 xwOBA.
So... yeah... we don't need to consider the spray tendency of the batter to model their effectiveness.
***
I said I would rerun everything doing it my usual way of using all batted balls to establish spray tendency. The results are almost as boring, but I'll lay it out, from bin 1 (most pull tendency) to bin 5 (most spray tendency). Actual wOBA first, xwOBA second, difference third. Ready?
-12 degrees, .382, .385, -.002 (rounding)
-10 degrees, .372, .371, +.001
-8 degrees, .378, .379, -.001
-6 degrees, .364, .363, +.001
-3 degrees, .348, .345, +.003
So... yeah... as it turns out, it doesn't really matter how I establish the spray tendency. We just get similar conclusions.
***
Now... How? HOW? HOW is it possible to ignore spray tendency and still be able to get the player wOBA to match to their xwOBA? Simply put: opposing teams know the pull/spray tendency of the batters and position their fielders accordingly. How about the HR? Well, that's true, but if you miss the HR, guess what, there's an outfielder who was positioned close by to turn that almost-HR into an out.
The reality that we found in 2016, when we had so very little data, such limited data, that allowed us to ignore the spray variable is being upheld with tons of more data. And this conclusion has been reinforced by other researchers who also found the same thing.
Long story short: while you need the spray angle to describe the PLAY, you do NOT need the spray angle to describe the (effectiveness of the) player. You can use the spray angle to show the PROFILE of the player, but it won't alter our opinion as to their overall performance.
I'll see you again in six months, where I'll do similar research in different ways. Again.
This is league-wide data, 2021-2023. LHH are "mirrored", so that all their pull data is on the left side of the chart, to match RHH. (click to embiggen)
At each launch angle level, the distance is higher the more you pull. It has more to do with how well a bat is hit more than anything.
At 28 degree launch for example, distance is maximized when you aim for the LF/CF gap. The more you hook, or the more you slice, the more likely you mishit the ball (lower speed). There's also the effect of the spin of the ball (the more you square up, the more likely you have backspin, and the more you mishit, the more sidespin. Just think of how you golf.)
I've done Delta Maps using Distances by launch angle x speed and comparing to the league average, or showing wOBA changes year to year along the same lines.
Kyle Bland showed a really snazzy one by using launch angle and some derivation of spray angle, and comparing the frequency of the player (Bo Bichette in this case) to the league average. It's really nice. So, I did the same thing, not as nice, but, it is more accurate since I use the actual spray direction, as well as showing the numerical values. Make no mistake, if I was as talented as Kyle, I'd overlay what I just did with heat maps as well. I'm not, so I won't. (Click to embiggen.)
This shows how many batted balls Bo Bichette (2021-2023) has at that particular combination of launch angle and spray direction, compared to the league average.
A few notes here.
The top row is the spray direction, where -45 is 3B foul line and +45 is 1B foul line, with 0 up the middle
The left column is the launch angle from -90 (down the ground) to +90 (straight up), with 0 being horizontal to the ground
Any batted ball short of 10 feet is put into its own short distance basket, labelled above as Chop
Any batted ball that was caught in foul territory is under Foul
So a few more notes:
Bichette is a HEAVY groundball batter: in addition to all those reds you see in the grand column at +4, -4, and -12 degrees, there's the huge red of 42 more choppers than league average
You can also see the complete lack of popups, at 36 degrees of launch and higher
Similar to Kyle snazzy chart, we can see a preponderance of groundballs hit to the 1B side, and a lack of popups to the left field
And much fewer foul outs than the league average
So, yeah, Kyle's presentation is brilliant, and we can tell a far better story by showing it relative to the league average. Thank you Kyle.
UPDATE: Here is Mookie Betts (click to embiggen)
Betts (2021-2023) is a big time flyball hitter. But good flyballs, not popups.
You can also see the complete lack of choppers, having 97 fewer than the league average. Betts has 149, and the league average is 97 above that, or 246.
He pulls all his line drive and flyballs, and really abandons the right side infield and short outfield.
A reminder that 28 degrees of launch angle is where you find most homeruns, though you can get them also at 20 degrees if you pull them enough
Once you can hit a ball 430 feet, every extra foot is irrelevant. Hitting a ball 430+ feet is a HR, regardless of distance, and hence the wOBA value of 2.
When you hit a ball under 350 feet, well, adding distance, or SUBTRACTING distance, is about the same. When you hit a ball under 200 feet, every extra foot helps. But once you get to 220 feet, every foot HURTS. Until you get to 320-330 feet or so.
So, if you look at all batted balls 0 to 350 feet, as a group, it's basically immune to extra or lost distance. Adding a foot or subtracting a foot doesn't change anything.
The rapid acceleration happens at 350+.
Now, if you follow baseball, you can guess the reason: there's a gap between the infield and outfield. Infielders play up to 150 feet from home plate, while outfielders play starting at 280 feet from home, up to about 330 feet from home. So, you can get success between the infield/outfield, or beyond the outfielders (and/or beyond the fence).
When you hit a ball at 95 mph, at the ideal launch angle (roughly 24-32 degrees), that ball will travel about 350 feet. This is why the Hard Hit rate really starts at around 95mph. It's not arbitrary. 90 mph is not enough to get you to 350. And 350 really is a threshold that needs to be cleared. Naturally, 100 is better than 95 and 105 is better than 100. Just saying 95+ for hard hit is just a gateway to better understanding Exit Velocity.
And so, when you look at a ball having more or less carry because of wind or any other reason, it's players who hit the ball 350-430 feet that are going to be the most affected.
Hitting a 450 foot HR is very indicative of a batter's talent. It shows that he has raw power and it shows that he can really put the barrel on the ball.
Hitting a 110 mph high popup to an outfielder for an easy out is also a good indication of a batter's talent. It shows that he has raw power and that a small mistiming is what kept him from hitting a 450 foot HR. This is what is called a Major League Out. For that particular PLAY, an out is an out, and is always bad. For that particular PLAYER, a Major League Out is almost a HR.
Similarly, a Texas Leaguer that clears the infield and lands in front of the outfield is always good for that PLAY, it is not a good indication of talent for that PLAYER.
So, let's talk about Expected wOBA and Predictive wOBA. Expected wOBA is the expected value of that PLAY. A Texas Leaguer is going to have a near 100% hit probability, and so if you have a low launch speed and a high launch angle, you are going to get an Expected Hit Probability that will approach 100%. It will explain that PLAY in RETROSPECT. It is much (much much) better to think of Expected Value in Retrospect, as in: The Expected Hit Probability WAS... Obviously, the word Expected can be used in both backward (expected was) and forward (expected will be), and so is confusing and ambiguous. The x-stats that you see, whether at Savant or anywhere else on the Web, are almost always meant to be retrospective, and simply measuring the PLAY.
Predictive wOBA is different. Predictive wOBA is tied to the PLAYER and not the PLAY. This is a critical distinction to make. A Major League Out is a much better outcome in describing the talent of a player than a Texas Leaguer Hit. When you see a Major League Out, as a fan, you should be disappointed, but as a scout, you should be elated. And the reverse when you see a Texas Leaguer Hit. The Expected wOBA (describing the PLAY) and Predictive wOBA (describing the PLAYER) is what you need to constantly remind you.
Let me show you three charts (click to embiggen). The first maps the current season's actual wOBA (actually it's wOBAcon, since we are only looking at batted balls, or Contacted plate appearances) to next season's wOBA (wOBAcon actually), for the 2020-2023 seasons, minimum 100 batted balls. Why do we compare to next season's wOBA? Because that is a (mostly) unbiased estimate of a batter's talent. That is a correlation of almost r=.5, which is pretty good. You can see that the slope of the estimate is close to .5, which means that next season's wOBA can be estimated as half-way between the current season's wOBA and the league average. So, a HR (wOBA of 2.000) in 2022 will indicate a wOBA of 1.180 of talent in 2023 (half-way between 2.000 and league average of around .360).
As we know, Actual Outcomes are filled with vagaries of the fielders and the park and the ball and on and on. This is why we prefer Expected wOBA over Actual wOBA. Expected wOBA is focused on those launch characteristics most in control of the batter (launch angle and speed), without worrying about whether the ball carries for 300 feet or 320 feet, or pulled at 20 degrees or 30 degrees, or how good the fielding alignment is positioned or how well the fielder reads the ball. All of those variables is what turns an Expected wOBA into an Actual wOBA. And Expected wOBA describes a batter's talent better than Actual wOBA, which you can see by the correlation having an r above .55. The slope of the line also suggests that you would weight the Expected value at 58%, and the league average at 42%. Every combination of Launch Angle (from minus 90 to plus 90) and Speed (from 0 to 125) has a distinct Expected wOBA value. The chart is obviously massive at 181 x 126 entries.
Now we come to the star of the show, Predictive wOBA. How does it work? We break up a batter's launch characteristics of each play, first along Launch Angle, into three categories. We have the Ideal Launch Angle, the Sweetspot range, of 8 to 32 degrees. We have launch angles above 32 and launch angles below 8 degrees.
Then we break up a batter's Launch Speed into four categories: 105+, 100 to 104.999, 95 to 99.999, and under 95 mph.
This gives us 12 combinations of speed and angle. For each combination, we get a Predictive Value (analogous to Expected Value, but in terms of the PLAYER, and not the PLAY). Here are those values.
So, when a batter gets his Major League Out, or any batted ball at over 32 degrees of launch at 100+ mph, that has a Predictive wOBA value of .838. This is one of the BEST things a batter can do, as it indicates TALENT. That's what Predictions are: an estimate of the TRUE TALENT of a PLAYER. That perfect hit, a ball hit at the Ideal Launch Angle of 8 to 32 degrees, at 105+ MPH: that is only SLIGHTLY more indicative of talent than a Major League Out: that has a Predictive wOBA value of .867.
The worst thing a batter can do is a high launch angle and low-speed, as that has a true talent value, a Predictive wOBA value of .206.
Once we apply the Predictive wOBA on each batted ball for every player and aggregate it at the season level, we can then compare to the next-season's Actual wOBA. And this is what we get: a correlation of r=0.61. Indeed, if you include all three measures, the Actual wOBA, the Expected wOBA and the Predictive wOBA, we STILL get an r=0.61 to next season's wOBA. This suggests that creating these 12 bins (it's actually 8, as there are some bins that stretch beyond one bin) is sufficient to describe a batter's profile. And we can completely ignore a batter's Actual wOBA as well as their Expected wOBA.
Can you improve this? The next step really to make it truly predictive is to also incorporate the amount of batted balls in the sample. The higher the number, the more indicative the outcomes are. But, we'll save that for a future thread.
So, we no longer need to compare Expected wOBA to Actual wOBA and talk about luck or Random Variation as being the distinguishing feature as to why the Expected wOBA diverged from Actual wOBA. No. What we actually care about is Predictive wOBA, and we don't even care about Actual wOBA any more, not if we care about the True Talent of our batter.
Next time, I'll repeat this process for Pitchers. I haven't done it yet, so I'm just as curious as you are.
I created a metric I call Speed above Escape Velocity (or Escape Velo for short), which is the number of mph above the minimum threshold in order to "escape" and create damage, where we can see some serious sh!t. What is that threshold level that the batter is trying to escape? Well, would you believe 88 mph? Yes, I'm thankful for that too.
Since 2020, the leaders in Escape Velo include the mainstays (Judge, Stanton) along with Sano in 2020, Acuna this season, and Vladdy in 2021. The top batters are usually around +10 mph of Escape Velo. Equally as important for the purposes of this study are the players bringing up the rear, with David Fletcher twice appearing in the bottom 10. (A small note for those unaware: any batted ball below 88 mph is given a value of 0. You accumulate speed above 88 only.)
Here's the study, if you want to try to recreate it. I looked at all batters, each season, from 2020 to 2023. For 2021 and 2022, I selected any batter with at least 100 batted balls, while for 2020 and 2023, it was any batter with at least 50 batted balls. Typically, we'd get about 400 batters in each season.
Then I ranked the batters, each season, in terms of their average Escape Velo. The 10% highest Escape Velo batters were put in Group 5. The next 20% highest were put in Group 4. The 10% lowest Escape Velo were put in Group 1, while the next 20% lowest were in Group 2. The middle 40% were put in Group 3.
So you get a sense of the scale, here's the average Escape Velo for the five groups, along with their wOBA on contact:
3.7, .321
4.8, .348
6.0, .379
7.2, .413
8.7, .451
So, no surprise: the more power a player has, the more effective they will be (once they make contact anyway).
Now that we have our five groups, we can repeat the previous study that looked at wOBA by launch angle, and look at it for each of the five groups. Here's how that looks (click to embiggen):
The first thing to notice that EVERY quality of batter, from your Giancarlo Judge to your Fletchers will have similar production when launching a ball at 16 degrees or less. Since the diffentiator among our five groups is the exit speed, then we can come to the conclusion that the inherent power of the batter won't impact their production at line drive or ground ball angles.
Yes, you'd still prefer a 100 mph line drive to an 80 mph line drive. But the gap in production between our strongest batters with their 9 mph of Escape Velo and our weakest batters at 4 mph of Escape Velo is quite muted. Why is that? Because at 16 degrees or less, there's not enough distance to do anything. A line drive at 80 mph or at 90 mph is pretty much the same thing: it's a single.
But, starting at 20 degrees of launch, things change very very much. It is at 28 degrees where we see the biggest difference in production. A power hitter at 28 degrees will be able to clear the fence, while a weak batter at 28 degrees will simply hit the ball right to the outfielder for an easy out.
Essentially, every batter has their own launch angle profile. The powerful batters will peak at a launch angle of 28 degrees, with a range of 5 to 36 degrees, for an average of 20 degrees. That is their target launch angle region. But for the least powerful batters, their target launch angle range is from 0 to 32 degrees, for an average of 16 degrees.
This is why a batter like Luis Arraez can get away with doing the damage he does: as long as he can get the ball launched at the 8 to 16 degree launch angle range, then he doesn't really need to worry about getting high exit speed. And so, every batter should be aware of their average Escape Velo, and target a launch angle approach that maximizes their production.
Every now and then, someone asks this question. While it may seem arbitrary, it is not. The data I am going to present is for the 2021-2023 seasons. (click to embiggen)
Note: Because 2023 hasn't had the advantage of the warmer seasons (current average temperature of 67.6 degrees), then to keep it on a similar scale, I make sure to select games in 2021-2022 seasons to have similar temperatures). Had I chosen all the games, the general pattern would be maintained, just a bit higher overall.
Here we see that under 95mph, the wOBA is all generally the same. Why is that? Well, batted balls under 90 mph are generally mishits of some kind, which means they will be weak flyouts to the outfielder, or easy groundball to the infielders. Basically, whether you hit the ball at 65 mph or 85 mph, you end up with similar overall results, even though we have a 20 mph gap. This is very very different when you compare balls hit at 90 mph and 110 mph. Even though that is a similar 20 mph gap, the results could not be any more different.
And so, absent knowing the launch angle, if you only have exit velocity on a batted ball, then the damage starts to happen at 95 mph. And that's why a hard hit happens at 95+.
This shows three charts: the top is for Distance, the middle is for Time (pitch release to landing) and the bottom is for wOBA.
The rows are by launch speed in steps of 5mph. The columns are by launch angle in steps of 8.
The green shows a drop in distance and time. The purple shows an increase in distance and time. One POSSIBLE inference is that there is more balls with top spin at the lower launch angles and more balls with backspin at the higher launch angles. And a POSSIBLE reason is that players are targetting higher launch angles, and so their attack angles are leading to more topspin. (It could also be the pitchers that are pitching to get batters to get more topspin.)
The redboxes in the wOBA chart is where batters are getting more success in 2020, and the blue is where they are getting less success.
Also note that for wOBA I included the GB angles. I didn't bother for the dist+time ones because there was no change year to year. In other words, it's kind of hard to get a different distance+time, since all those balls will be hit with topspin.
In 2019, there was NO STRETCH where fielders converted as many plays into outs. The average was 69.1%
2020 is 3.8 standard deviations from 2019
DER is essentially 1 - BABIP
***
We are talking 2019 to 2020. I don’t know what level of talent influx you can have in the offseason that is targetted to fielding.
Fielding alignment might be one reason.
***
It’s not just a 2019-20 difference. This is consistent back to 2016 (and earlier).
HR rates are no different in 2020 than in 2016-19:
***
Those that have “angle” in it is using tracked data. The remaining (editor’s note: which is very very limited in 2020) is using stringer data. The tracked and untracked data will be biased, so be careful in making too strong a conclusion.
***
From 2010-2019, the DER for SP was .691, the same as for RP.
In 2020:
.718 SP
.703 RP
So, Starting Pitchers are driving the big change
***
This shows how often balls assigned to outfielders are converted into outs.
It removes balls assigned to infielders as well as any "impossible to make" plays. As you can see, a slightly more number of plays being made by outfielders meaning slightly better positioning, or balls luckily getting closer to the outfielders. This is only one standard deviation, so, not THAT much of an impact. And since it's a different tracking software, it could be slightly different as a result.
One of the main projects we’re working on is Layered Hit Probability. The idea is straightforward: we see the number of hits that happened. So, break it down into how those hits happened.
Let’s take Justin Verlander. He had 498 batted balls. Since the league average is 0.332 hits per batted ball (aka BACON or batting average on contact), a league average pitcher would have allowed 166 hits. Verlander however allowed 137 hits, or 29 fewer than average. That is the second best number after Jeff Samardzija at -30. Verlander got good fielding behind him, to the tune of 3 more outs made (or 3 fewer hits allowed). His xBACON would suggest he’d allow 150 hits, which is 16 fewer hits than average. So, if we add it up, we can see that we can account for 19 fewer hits than league average based on good fielding and poor contact by batters. And so, there’s another -10 hits unaccounted for. We’ll find those as we keeping layering in the variables (spray angle, fielding alignment, park, environment).
We can do this for all pitchers. Among the pitchers who gave up far fewer hits than we can (currently) account for are Kershaw and Bieber, both at -14. Bieber is interesting because he gave up hits right at the league average (186 compared to 185). His fielding hurt him by 2 hits, so we’d have expected him to allow 187 hits. And he got hit really hard to the tune of +13 xHits. And so, we’d have expected 200 hits based on bad luck fielding and hard contact. But he gave up 186 hits, 14 fewer than expected (so far). So, there’s -14 hits to account for. We’ll find them, whether it’s because of spraying the ball, or the fielding alignment or the park, and so on.
On the flip side is Lance Lynn, who gave up +7 more hits compared to league average. He got tremendous fielding support, to the tune of 7 hits. So, now we’re at +14 more hits compared to the expected. His xBACON was close to league average, at -1 hits. And so, we’re at +15 more hits unaccounted for. We’ll find them.
Once we have an accounting of where all these hits came from, that’s when the fun can start: which of these is really about the pitcher, and how much of it can we associate to the pitcher. As Voros has taught us 20 years ago with DIPS, and Bill James even earlier with DER, the hits a pitcher allows is strongly influenced by many other components. It’ll be fun to figure it all out.
This is what I did. I looked at all batters in 2018-19 who had at least 100 batted balls in each season. For each hitter, I tracked the frequency of their launch angle in the sweet spot (8 to 32 degrees, where all the solid hits and HR come), less than 8 degrees (basically GB and very low line drives) and more than 32 degrees (basically high FB or popups).
I classified each hitter based on their change from 2018 and 2019 in the frequency of the above, and dumped them into 5 groups. The group that lowered their launch angle the most had a drop of an average of -3.5 degrees. Those that raised the launch angle the most increased by an average of +4.4 degrees. The other three groups were: -0.8, +0.1, +2.0.
Now, why would we think there might be a change? Well, it's the mishits. The top-end exit velocity, we wouldn't expect much to any change. But the more you deviate your swing plane from the oncoming pitch plane, the less flush you might hit the ball, and so, prone to mishits, which means it'll reduce your exit velocity. Of course, you are also prone to a complete swing and miss, which will turn a mishit (and its reduced exit velocity) to no-hit, and so removed from the sample!
Anyway, here's the change in exit velocity for our 5 groups:
+0.3 mph: Lowered Angle (Major)
-0.1 mph: Lowered Angle (Minor)
+0.4 mph: Neutral change in Angle
+0.2 mph: Uppered Angle (Minor)
+0.5 mph: Uppered Angle (Major)
As you can see, no real trend here. We do see an overall increase in speed in 2018 to 2019 of about 0.3 mph, which is consistent with a 0.4 mph that I reported earlier.
My next step will be to look if the exit velocity distribution changes by launch angle. You would THINK that if you lowered your launch angle, then your higher exit velocity will now happen at the lower launch angles. And similarly, if you increase your launch angle, the higher exit velocity will now happen at the higher launch angles. But, there's reasons to think that it shouldn't matter. I haven't looked at the data, so as soon as I do, I'll post my findings.
Here's her report (pdf). Other than to make one tiny almost inconsequential factual correction, the entirety of the report is hers, and went through no edits on my part. Based on her findings, in the offseason, we'll need to tweak our definitions.
If a ball is hit 390 feet to dead center, how do we want to handle the CF positioned 300 feet (0% catch prob) from home compared to being positioned 340 feet (100% catch prob) from home. And how do we want to handle the batter? Is it a ball hit 390 feet and so is a HR almost 50% of the time? Or is it a ball that lands 15 feet short of the fence, and so is a HR 0% of the time? And what about the pitcher in all this? Do we care about the ball, the batter, the pitcher, or the fielder? Or all of them? Are they intertwined or independent?
Trying to get “one” answer to multiple legitimate questions is how we get into trouble. If you make the metric so specific that it can ONLY answer one question, you’ve boxed yourself in.
This is why I like FIP: it does what it does, no more, no less. And it allows other things to be built on top of it. So in my view, I like speed+angle, because those are the two things the batter has the most influence on. And it’s “scaled” to hits or wOBA. “Hit Probability” is too specific a term for what the metric is doing. We may call it “hit probability”, but it’s more “Speed and Angle impact to hit probability”. That's the metric. If we wanted to include EVERYTHING, then guess what: the hit probability is what you see, it was either caught or not. You have to decide what you want to peel away, and more importantly WHY.
If you follow the FIP approach, what you care about is the influence of the player has on a typical play, not THAT ball, and certainly not THAT play. There's no right or wrong answer. You just need to define your question very specifically, and live with the consequences of its implication. FIP takes a minimalist approach, doesn't try to do too much, and so, is flexible. That's why speed+angle is what we use for batted balls.
?There's alot of current and aspiring saberists out there. You'll usually find them at Fangraphs, like Eli and Craig. But you'll also find them in their own corner of the blogging world, like Hareeb, who tackles the case of Harrison Bader.
A couple of days ago, someone noted on Twitter that Bader had a weird combination of LOTS of barrels (which by definition requires at a minimum a high exit velocity) and a LOW AVERAGE exit velocity. For the two things to be true, Bader would have to have tons of batted balls both at the high and low exit velocity. Which he does
Back to Hareeb, who then asks which is more indicative of talent, the high barrels or the low exit velocity. I already knew the answer (hint: there's a reason a metric gets created), but I didn't know the extent that it would be true. He first looks at barrels, avg EV and wOBAcon independently. He astutely notes:
That’s not a huge win, but it is a win, but since these are three ways of measuring a similar thing (quality of contact), they’re likely to be highly correlated
In other words, he realizes that the correlation is the first step not the last step. This is why in hockey, NetShots are highly correlated with future NetGoals: part of the NetShots is made up of past NetGoals. So, both past NetGoals and past NetShots would be correlated to future NetGoals. The RIGHT thing to do is look at NonGoalShots separate from Goals. But I digress.
He continues his research with this conclusion:
That’s.. a gigantic effect. Knowing barrel/contact% provides a HUGE amount of information on top of average exit velocity going forward to the next season.
...
Knowing barrels on top of average EV tells you a lot. Knowing average EV on top of barrels tells you a little.
In other words, it's the same spirit as to how I discussed the idea of separating goals and non-goal shots. He went about it in a clever way, looking at outliers to see the effect.
Anyway, terrific stuff, the kind of saber work we used to do at the old Baseball Boards (RIP), and we tried to continue ever since.
Focus on a hitter's hardest hit balls. Those are likely those that he got all of it, and so likely what he intended to do. The launch angle that resulted is probably what he's after. For Mookie Betts in 2016-2017, that launch angle was at 5 degrees (blue vertical below) and 6 degrees (orange vertical line below). In 2018, it was very different at 16 degrees (green vertical line below).
Focusing on a hitter's hardest hit balls also gives us his intended spray direction. We can take all the batted balls at +/- 10 degrees of his intended spray direction to give us his personal straightaway spray direction. Interestingly, in 2016, that was -8 degrees, which is about typical for an MLB hitter, targetting a spot halfway between the shortstop and the 2b bag. In 2017 he pulled more, at -13 degrees, so targetting more directly toward the SS, or the gap between LF and CF. And in 2018, he did the same at -13 degrees. If we focus on these batted balls, balls that he hit at his presumed straightaway, we get the distance of each batted ball.
We can then determine what the distance hit for his actual speed+angle by the league, what we can call his xDistance. And then simply compare his actual distance to the xDistance, for the Extra Distance. That's what the chart below is showing. And we can see that in 2016-17, at the line drive angles, meaning 4-20 degrees, Betts was getting plenty of Extra Distance. Which if you are hitting slightly under is what you would get. In 2018, he must have retooled his swing, since he did not get extra distance on his line drive angles. Indeed, he was short a bit.
But it's not all about the distance. What you care about as well is the frequency of hitting balls in the sweet spot launch angles of 8-32 (and hopefully hitting it hard at those angles). And Betts was among the league leaders in both Hard Hit % and Sweet Spot Angle % (11th and 27th respectively). New teammate JD Martinez was 4th and 20th respectively.
As you know, MGL brought up the point that spray angles needed to be part of the estimating the value of a batted ball. Which led me down the FIP path of trying to ascertain if we are ultimately trying to evaluate plays or players.
While evaluating each play is a stepping stone to evaluating players, doing so really introduces a whole ton of noise if the player has limited influence on the play. Hence the FIP / BABIP dichotomy.
In taking up MGL's challenge in showing that the spray angle really is more noise than signal in evaluating players (the thing he cares about), I did notice there was a bias based on how hard a ball is hit. Or rather, the kinds of players who hit the ball hard. To make the illustration clear: If Stanton hits a ball to the warning track, it's more likely to be caught than if Billy Hamilton did so. And as baseball fans we can guess easily enough: outfielders play deeper for Stanton. At the same time, this means that balls that fall in over the infielders are more likely to be caught if Hamilton hit it than Stanton. Where the outfielders giveth in one spot, they taketh in another spot. Equilibrium? No!
First, here's the overall data, breaking up all the hitters into five groups, in terms of their median exit velocity. That is one smooth relationship. The harder you hit it, the lower your actual wOBA for the same batted ball that overall weaker hitters would hit.
?
We can also break it down into the six categories of batted ball based on speed+angle pairs. That's the profile that determines if the batted ball is a barrel, solid contact (or near barrel), flares+burners, and the three poorly hit categories (weak, topped, under). How do our speed group players do?
?
The barrels and near-barrels tell the main story here: the weaker you are as a hitter, the higher value your barrels are worth. In other words, because we are not using Fielding Alignment as a variable, this bias is exposed at this level of grouping.
We also see the bias a bit with flares and burners in the same direction, as well as weakly hit balls.
It goes the other way on balls that are mishit under, which we reason based on the outfielders being shallow more likely to catch those balls.
***
We recently upgraded the xwOBA model to include the seasonal Sprint Speed of the batter, on a similar concept: the same groundball hit by Pujols and Buxton have a different value because of how fast they run. Therefore, we can consider a similar idea here: using a batter's power to infer a fielding alignment that is consistent with that kind of power.
Eventually we'll use THE ACTUAL fielding alignment, but not yet.
Recent comments
Older comments
Page 1 of 150 pages 1 2 3 > Last ›Complete Archive – By Category
Complete Archive – By Date
FORUM TOPICS
Jul 12 15:22 MarcelsApr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref
Apr 12 09:43 What if baseball was like survivor? You are eliminated ...
Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method)
Jul 13 10:20 How to watch great past games without spoilers