Tangotiger Blog

Sunday, December 04, 2022

Spray Angle overfits xwOBA

By Tangotiger

I'm always interested in saber research, constantly seeking it out. The latest one I found is on reddit called 3D wOBA. First off: what a great name! I wish I would have thought of it.

Anyway, the researcher notes that xwOBA is principally driven by speed and angle, and notes the lack of use of the Spray Angle. Which is true. And intentional.

As I've said many times in the past: it's a question of describing the PLAY or the PLAYER. Why is FIP popular? Because it describes the PLAYER. Doesn't giving up 0 hits in a game mean it's a great game? Yes. But, does it mean it's a great PITCHER? No, not necessarily. That's because hits that stay in the park are subject to a great deal of random variation having nothing to do with the pitcher himself. You have the fielders, the fielding alignment, and the park. Not to mention most pitchers are similar in BABIP talent that it requires a GREAT deal of batted balls to find the signal amongst all that noise.

Anyway, back to the matter at hand. The research very (very) helpfully provided the data. And so, it took just a couple of minutes for me to do the test I needed to do: compare wOBAcon, xwOBAcon, and 3DwOBAcon to NEXT SEASON'S wOBAcon. Why do we want to do that? Because that data is unbiased. It's describing the PLAYER. And that is what I care about. And really, when you think about it, most of the time, that's what you care about too.

Anyway, here are the correlations. A straight wOBAcon to wOBAcon correlation is an r=0.55 (the sample had an average of 359 batted balls). This gives us a ballast value (the regression amount) of 293 batted balls.

How about the xwOBAcon from Savant (as shown in their spreadsheet anyway)? That's a correlation of r=0.56. We learn a little bit more, but not much more, than just using their actual performance. But at least, directionally, it's where we want it.

Now, finally the superbly named 3DwOBAcon, how did it do? Correlation of r=0.52. Wait, it's LOWER? Yes, it is. And this is very typical when you overfit your data. You are so focused on trying to explain THAT PLAY that you ignore what really matters here: the players themselves.

When you run a correlation to same-plays, the Savant xwOBAcon has a correlation of r=0.83 while 3DwOBAcon is slightly higher at r=0.85. The good thing here is that their model, as they've wanted to tune it, works fine. The extra dimension, the spray angle, does in fact better help describe the play in question.

But, from the PLAYER perspective, the spray angle is mostly noise. And so when you use that information as a critical component to describe the player TALENT, and so, be able to predict next season, you are introduce noise for that prediction. It's like trying to use ERA to explain ERA next season instead of FIP. Or use win% to predict next season's win% instead of using ERA to predict next season's win%.

And this is why, by and large, we don't use BABIP to evaluate pitchers. And this is why, by and large, we don't use spray angles to evaluate batters.

() Comments • • Statistical_Theory

Recent comments

Nov 23 14:15		Layered wOBAcon
Nov 22 22:15		Cy Young Predictor 2024
Oct 28 17:25		Layered Hit Probability breakdown
Oct 15 13:42		Binomial fun: Best-of-3-all-home is equivalent to traditional Best-of-X where X is
Oct 14 14:31		NaiveWAR and VictoryShares
Oct 02 21:23		Component Run Values: TTO and BIP
Oct 02 11:06		FRV v DRS
Sep 28 22:34		Runs Above Average
Sep 16 16:46		Skenes v Webb: Illustrating Replacement Level in WAR
Sep 16 16:43		Sacrifice Steal Attempt
Sep 09 14:47		Can Wheeler win the Cy Young in 2024?
Sep 08 13:39		Small choices, big implications, in WAR
Sep 07 09:00		Why does Baseball Reference love Erick Fedde?
Sep 03 19:42		Re-Leveraging Aaron Judge
Aug 24 14:10		Science of baseball in 1957
Aug 20 12:31		How to evaluate HR-saving plays, part 3 of 4: Speed
Aug 17 19:39		Leadoff Walk v Single?
Aug 12 10:22		Walking Aaron Judge with bases empty?
Jul 15 10:56		King Willie is dead. Long Live King Reid.
Jun 14 10:40		Bias in the x-stats? Yes!
Jun 13 17:05		Bat Swing Checklist
Jun 07 12:10		Spray Angle is not needed, part 32
Jun 02 17:37		Stanton Swing Speed and Acceleration Curves
Jun 01 14:44		Statcast Lab: Pre-introducting Bat Acceleration
Jun 01 12:14		Bill James and Tango talk WAR
Older comments Page 1 of 150 pages 1 2 3 > Last ›
Complete Archive – By Category Complete Archive – By Date 2024 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov 2023 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2022 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2021 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2020 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2019 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2017 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2016 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2015 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2014 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2013 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec FORUM TOPICS Jul 12 15:22 Marcels Apr 16 14:31 Pitch Count Estimators Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS Jan 29 09:41 NFL Overtime Idea Jan 22 14:48 Weighting Years for NFL Player Projections Jan 21 09:18 positional runs in pythagenpat Oct 20 15:57 DRS: FG vs. BB-Ref Apr 12 09:43 What if baseball was like survivor? You are eliminated ... Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method) Jul 13 10:20 How to watch great past games without spoilers

Tangotiger Blog

Sunday, December 04, 2022

Spray Angle overfits xwOBA

Recent comments

Older comments

Complete Archive – By Category

Complete Archive – By Date

FORUM TOPICS

Latest...