Thursday, June 06, 2024
Bias in the x-stats? Yes!
Having thoroughly refuted several times, both by myself and other independent researchers, that the spray direction is the missing ingredient in the x-stats, the question remains: what are missing ingredients?
Someone brought up the case of Isaac Paredes, who is a heavy pull batter. However, there is another attribute of Paredes: he does not hit the ball hard. Now, you may think that the x-stats ALREADY account for the exit velocity. After all, the two main ingredients is launch angle and speed. We account for the launch speed. Don't we? Well, once again, I must again talk about the difference between modeling a PLAY and modeling a PLAYER. The x-stats, traditionally, evaluate PLAYS. But, since we are interested in PLAYERS, we limit the variables so that we focus on the PLAYERS. In other words, yes, we evaluate each play, one at a time. But instead of considering AS MANY variables as we can that went into that play we consider AS FEW variables as we can that went into that play that the player themselves have a strong influence.
Launch speed is an easy one to include on an event by event level. Launch angle as well (the easiest one that separates groundballs from home runs). The Spray Direction is one that is needed on the play, but is not needed for the player (as we've learned many times). So, we ignore that one. We include the Seasonal Sprint Speed of the runner, as that's important on groundballs.
Which gets us back to Launch Speed. Remember last night, I created a profile of each batter, to establish their Spray Tendency? Well, what if we do the same thing, but with Launch Speed? That is, let's create a profile of a batter based on how hard they hit the ball.
Now, you may think: we ALREADY account for this on a play level right? Yes, we do. But, what if a 100mph batted ball by Isaac Paredes is different from a 100mph battedball by Giancarlo Stanton, even when both are hit at 20 degrees of launch? In other words, we want Launch Speed to pull double-duty: we want to know the launch speed on that play, but we also want to know the batter's seasonal launch speed.
So, do we see a bias based on a batter's seasonal launch speed? Yes. Yes, we do.
Here's what I did, so you can feel free to replicate. I'm focused on 2016-2019 years as one seaons and the 2020-present (thru June 5, 2024) years as a second season. I do this on the idea that a player has a general speed tendency that spans multiple years. This lets me increase my sample size for each season. I also make sure that a batter that hits on both sides is considered two distinct players.
The speed tendency follows the Escape Velocity method for Adjusted speed: greatest(88, h_launch_speed). For every batted ball, I take the greater of the launch speed and 88. And I average that.
Anyway, I use the same Pascal method of binning I did last night, the 10/20/40/20/10 split.
So, on to the fascinating results. For the weakest batters, the Paredes and Arraez and so on, their xwOBA was .306, while their actual wOBA was .318. That is an enormous bias of 12 wOBA points. The next weakest batters had .339 xwOBA and .345 actual wOBA for a bias of 6 points.
The strongest batters had an xwOBA of .452 and a wOBA of .442, for a 10 point shortfall. The next set of strongest batters had an xwOBA of .411 and a wOBA of .402 for a 9 point shortfall. The middle group were pretty much even.
Now, before we get TOO excited, what else could cause this? I have a few thoughts, but let me just leave this here for now.