[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
THE BOOK cover
The Unwritten Book
is Finally Written!

Read Excerpts & Reviews
E-Book available
as Amazon Kindle or
at iTunes for $9.99.

Hardcopy available at Amazon
SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
Shop Amazon & Support This Blog
RECENT FORUM TOPICS
Jul 12 15:22 Marcels
Apr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref

Advanced

Tangotiger Blog

<< Back to main

Tuesday, November 19, 2024

Layered wOBAcon

(In re-reading this, I have alot of ALL CAPS.  I'm not shouting, just emphasizing.  I can edit it to be small case bold if this bothers anyone.)

As I'm getting near the end of preparing Layered Hit and HR Probability, let me now turn my attention to Layered wOBA (or more specifically because it's on Batted Balls or Contacts, it's actually wOBAcon).

Layered wOBAcon requires the probability of each 1B, 2B, 3B, HR for each layer.

In order to have a baseline, let's just look at how wOBAcon and xwOBAcon correlate with next season's wOBAcon. Among batters with 150 PA in back to back seasons, for the 2021-24 seasons, the correlation of wOBAcon (year T) to wOBAcon (year T+1) is r=.49. With an average sample of 330 PA, we also learn that you want to add ~330 PA of league average wOBAcon (that's the prior) to the current year wOBAcon (that's the observed) to estimate next year's wOBAcon (that's the posterior). Remember that stats class where for 13 weeks they went thru the horribly named Beta Distribution with the even more horribly named alpha and beta parameters? Yeah, this is what they were talking about.

As for xwOBAcon (year T) to wOBAcon (year T+1), that's an r=.60. So, add yet another +1 in the win column for x-stats better describing the talent of players than their actual stats do. Whether W/L v ERA correlating to next year's W/L, or ERA v FIP correlating to next year's ERA, or wOBAcon v xwOBAcon correlating to next year's wOBAcon, it's all part of the same pattern: the observed stats are filled with tons of noise (Random Variation) that it hides the actual thing it is purportedly trying to measure.

Alright, so let's get back to Layered wOBAcon. The first layer we have is Launch Speed. How does Layered Speed (year T) correlate to wOBAcon (year T+1). That's an r=.60! Whoah, that's the SAME as xwOBAcon? What's going on here?

Welcome to my world, where for the last 8 years I've been discussing and describing and otherwise deliberating the PLAY v the PLAYER. When Statcast came out there was this enormous rush to taking the specifics of a play (launch speed and launch angle, notably) and using that information to purportedly describing what the player did, but was in fact simply describing the PLAY. This should have been plainly obvious when it came down to looking at 70-80mph batted balls, but launched just high enough that you'd get a high hit probability: those balls would land over the infielder and in front of the outfielder. Outside of maybe Ichiro and Arraez, NOBODY intends to do that. Every single batter is trying to hit the ball hard, at least 90mph. And so, batted balls hit at under 80mph are undoubtedbly mistakes. There are of course mistakes that lead to good outcomes and mistakes that lead to bad outcomes. But from the perspective of the talent of the player, these are better bundled together as launch speed mistakes.

Similarly, you have what scouts call Major League Outs: these are batted balls that are hit 100+ mph, but at such a high launch angle (45+ degrees) that it ends up being a very high fly out. These are better addressed as launch angle mistakes. It takes TREMENDOUS power to mishit a ball to get a 45 degree launch angle and still hit the ball 100+. If you have a batter already at the major league level, these launch angle mistakes are far easier to overcome than launch speed mistakes.

What happens with the x-stats that bundle things together, like xwOBAcon does, is that it is only focused on the PLAY. And so, xwOBAcon looks at the outcome of that combo of speed+angle, and based on the historical outcome of that combination decides how good a hit that was. Doing that removes the individuality of each of the speed and angle.

In other words, this combination approach is actually adding what is analogous to Random Variation in trying to describe the player by essentially overfitting on the play.  From the perspective of the play, it's not an overfit.  From the perpective of the player, it IS an overfit.  We need a paradigm shift here.

This is where a Layered approach comes in. First, we focus on the primary thing that will describe the PLAYER (launch speed) and then we do our best to describe the PLAY. And incredibly, we ALREADY achieve an r=.60 doing only that.

The next layer we add is Launch Angle. Doing that gives us a small boost to r=.64. Adding the Launch Angle as a layer in this manner now allows us to better describe THE PLAYER. Sure, we lose some value in describing THE PLAY, but that's a small (temporary) loss.

From here on out, we can add each layer, one at a time (Carry, Spray Angle, Batter Running Speed, Fielding Alignment, Fielder Performance) so that we can TOTALLY describe the PLAY. You see in this paradigm shift, by accounting for every variable, we will get an r=1 in terms of describing the hit or out. And by leaving it as Layers, we can then decide which are actually the ones we care about in describing the PLAYER.

The most impactful is the launch speed, as we already presumed and surmised. The batter's running speed is also important: this is really a trait ingrained to the player. See, this is what we are after here, to establish a tool or trait for each player. Launch Speed is a powerful trait (a combination of Bat Speed and Quality of Contact), and at the major league level, we've already selected for players to have decent Quality of Contact. Running speed is a natural trait as well.

The next on the list is Launch Angle, at about 20% the weight of Launch Speed.

The Carry layer is impactful, but in a negative sense. While we can describe the individual plays by how much Carry the ball has (whether it's by the spin imparted, or the wind or the specific traits of that particular snowflake of a ball), these actually are not helping in describing the batter.

The Spray Layer has almost no weight at all. With a p-value of 0.51, it becomes an easy feature to ignore. Yes, you need it to describe THE PLAY. But when it comes to describing the effectiveness of the player, we don't need the Spray Layer. Yes, it becomes useful to describe the PROFILE of the player (pull, spray, etc), but not their overall performance.

The Fielding Alignment Layer has no weight at all in describing the player.

So, there you have it, the three critical components are Launch Speed, Launch Angle, and Running Speed. Exactly what we already have in the x-stats. Except re-arranged and approached in a different way to better describe the player than the x-stats. And dependent on the remaining layers (Carry, Spray, Fielding Alignment, Fielder Performance) to better describe the play than the x-stats.


(1) Comments • 2024/11/23 • Batted_Ball

<< Back to main