[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
THE BOOK cover
The Unwritten Book
is Finally Written!

Read Excerpts & Reviews
E-Book available
as Amazon Kindle or
at iTunes for $9.99.

Hardcopy available at Amazon
SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
Shop Amazon & Support This Blog
RECENT FORUM TOPICS
Jul 12 15:22 Marcels
Apr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref

Advanced

Tangotiger Blog

A blog about baseball, hockey, life, and whatever else there is.

Monday, September 12, 2022

Statcast Lab: Layered HR Probability

A player hit a HR: what is the chance that he would have hit a HR? That seems like an odd question. But that seems to be the sequence of events right? A player hits a HR. Then we check the HR probability. Underlying in that is "if not for...". So, the real sequence is a bit different. A player hit a HR: if not for ___, what is the chance that he would have hit a HR?

And so, buried in the sentence is the most important part: what is it that we are trying to isolate? Are you we trying to remove the ballpark? Are we trying to remove the spray direction? Are we trying to remove the spin of the ball? Are we trying to remove the unique ball construction properties (since every ball is as unique as a snowflake)? Are we trying to remove the launch angle? The launch speed? What exactly are we trying to get as a probability?

Then just as important: WHY do we care? Are we trying to explain that particular play? Or are we interested in that particular player? The reason that DIPS (through FIP) has taken a hold of the landscape is because we are trying to isolate the player's direct contribution. When a pitcher allows ten hits with no home runs, we know, because of DIPS, that alot of that is not necessarily because of the pitcher. It's why we focus so much of a pitcher's contribution on his strikeouts, walks, hit batters, and home runs. We are trying to isolate those things that the pitcher has a more direct contribution. "If not for the fielders and the fielding alignment..." is the precursor. That's why we care about FIP.

When it comes to the homerun, things are a bit shakier because there are so many interesting questions to ask, with some relating to the play itself, and some relating the the player only. How can we get there? Welcome to our paradigm shift: Layered HR Probability.

Reposting from this article:


And at one of our many meetings on the subject with a revolving door of different folks chiming in, former Baseball Savant savant Daren Willman noted, paraphrasing:

If we start to consider everything, then every play will either have a 100% or 0% Hit or Out Probability.


And that was the key for me. That’s what cemented the paradigm shift.

Everyone can see the hit. It’s 100% a hit. The question we want to ask therefore is how is it a hit, why is it a hit? How hard did the batter hit the ball? What was the launch angle? The spray direction, and where were the fielders? How good are those fielders, and how well did they perform on that play? What park was that hit in, and how hot was it, and what’s the elevation and how far was the fence? And how fast is that batter as a runner? So rather than coming up with something rather ambiguous or confusing like a 36% Hit Probability, we can instead ascribe the probability of that hit to each component of the context of the play, at that point in time and space. And the key: make sure it adds up to 100% Hit or 100% Out.

And the user can then choose for themselves what they mean by Hit Probability. They can add up only those components that they are interested in. If you are like me, and want to focus on launch speed and angle, so be it. But if others want to include other components, they can do so as well. When Nolan Arenado hits 200 homeruns since 2016, rather than say he should have hit 150 (or 175 or 190) HR if not for Coors, wouldn’t it be better to establish how much every component contributed to the 200 HR, rather than just one? How much of those 200 HR is a result of his power, or his launch angle, or his spray direction, or his many parks, and so on? How does it all add up? We have 200 actual HR, not 150 (or 175 or 190). We want it to all add up to 200.


The model is nearing completion. So, let's get to it with a couple of examples from everyone's favourite player: Mookie Betts.

In this play (video), Betts hit his hardest ball of the year, at 109 mph. Based on the exit speed, we'd expect a HR about 25% of the time. Since making contact leads to a HR about 5% of the time, we can give this breakdown:

+5%: making contact
+20%: ... at 109 mph

That's 25%.

But, as we saw, that was a line drive at 12 degrees of launch. And at 12 degrees, no one, not even Giancarlo Stanton, is getting a homerun. That's why we did not see a homerun there. So, we have this breakdown:
+5%: making contact
+20%: ... at 109 mph
-25%: ... at 12 degrees launch

That's 0%. We did not see a HR, and this is the reason.

Ok, so that was boring. Let's now find an actual homerun. And we've got ourselves a really good one (video). This one was hit at 108mph, when about 23% of batted balls leads to homeruns. So we have this initial breakdown:
+5%: making contact
+18%: ... at 108 mph

It was launched at 21 degrees. That's getting into viable, though not overwhelming, HR territory. A ball hit at 108 mph, at 21 degrees, leads to a homerun 40% of the time. Let's update our breakdown:
+5%: making contact
+18%: ... at 108 mph
+17%: ... at 21 degrees

So, we are now at 40%. Since we know a homerun did happen (100%), we still have to explain the missing 60%.

That ball had a distance of 416 feet, which is longer than the typical ball hit at 108/21. Whether it was wind, or the spin, or the snowflake-properties, we know that balls hit 416 feet leads to a HR 88% of the time. Updating our breakdown:
+5%: making contact
+18%: ... at 108 mph
+17%: ... at 21 degrees
+48%: ... at 416 feet

Add up everything, and we're at 88%.

Now, that ball was hit to almost dead center. That doesn't help in getting homeruns. Indeed, at that distance, at that spray direction, it's a homerun in 22 of 30 ballparks, or 73%. So, that spray direction cost Betts 15%. Our current breakdown:
+5%: making contact
+18%: ... at 108 mph
+17%: ... at 21 degrees
+48%: ... at 416 feet
-15%: ... at almost dead center

But Betts does not swing at all 30 ballparks. He swings at one ballpark. And in this case, it's the Miami ballpark. And as we saw, it did in fact clear the fence. And so, we now accounted for the last of the variables:
+5%: making contact
+18%: ... at 108 mph
+17%: ... at 21 degrees
+48%: ... at 416 feet
-15%: ... at almost dead center
+27%: ... at Miami

And there you go, 100%. And repeating the words of Daren, "If we start to consider everything, then every play will either have a 100% or 0% [Homerun] Probability."

And that is what we have found. Having isolated each component of this play, we can now explain this play however we want to explain it. If we just want to focus on those things that Betts has the largest control over (contact, launch angle and speed), then we'd say "40%". If we want to consider everything except the ballpark, then we'd say 23/30 or "73%". And if you want to come up with any combination you want, then you are now empowered to say anything you want.

You want to know how that ball was a homerun, why it's a homerun? Well, now you know.

And of course, we'll be able to aggregate this across the season, for each player, so we can isolate each of the component and the extent each is contributing to the homeruns.  We're in a position to both describe the play and the player, at the same time.

(11) Comments • 2022/12/13

Latest...

COMMENTS

Nov 23 14:15
Layered wOBAcon

Nov 22 22:15
Cy Young Predictor 2024

Oct 28 17:25
Layered Hit Probability breakdown

Oct 15 13:42
Binomial fun: Best-of-3-all-home is equivalent to traditional Best-of-X where X is

Oct 14 14:31
NaiveWAR and VictoryShares

Oct 02 21:23
Component Run Values: TTO and BIP

Oct 02 11:06
FRV v DRS

Sep 28 22:34
Runs Above Average

Sep 16 16:46
Skenes v Webb: Illustrating Replacement Level in WAR

Sep 16 16:43
Sacrifice Steal Attempt

Sep 09 14:47
Can Wheeler win the Cy Young in 2024?

Sep 08 13:39
Small choices, big implications, in WAR

Sep 07 09:00
Why does Baseball Reference love Erick Fedde?

Sep 03 19:42
Re-Leveraging Aaron Judge

Aug 24 14:10
Science of baseball in 1957

THREADS

September 12, 2022
Statcast Lab: Layered HR Probability