[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
THE BOOK cover
The Unwritten Book
is Finally Written!

Read Excerpts & Reviews
E-Book available
as Amazon Kindle or
at iTunes for $9.99.

Hardcopy available at Amazon
SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
Shop Amazon & Support This Blog
RECENT FORUM TOPICS
Jul 12 15:22 Marcels
Apr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref

Advanced

Tangotiger Blog

A blog about baseball, hockey, life, and whatever else there is.

Thursday, May 20, 2021

Math behind no-hitters

The Math

In the 10 years from 2010-2019, there have been an average of 8.7 hits per 9 IP. The hit rate (meaning the batting average if we include SF in the denominator) was .252. Which means the out rate is one minus .252 or .748. In order to get a no-hitter, you need 27 outs. And the odds of that is .748 to the power of 27. That works out to 0.0004 per game, or 4 per 10,000 games, or 1.9 per full season. There have actually been an average of 3.4 no-hitters in that time span. Why is the math off?

The Assumptions

Well, the math is not off. The assumption is off. We are assuming that each pitcher has a .748 out rate. But, some pitchers are much better and some are much worse. And some teams are a bit better and some teams are a bit worse. And when you raise that number to the power of 27, you get an exponential difference, not a simple difference. In order to "adjust" for the distribution of players, we can modify the mean out rate of .748 upwards by .016 to .764. So that becomes our "effective" out rate for the population. Raise that to the power of 27 and you get 0.0007 per game or 7 per 10,000 games, or 3.4 per full season. And that's what we've witnessed, 34 no hitters over the ten years of 2010-2019.

So, with our model in place, how did 2020 compare and how does 2021 compare? For 2020, we would have expected 1.72, and we got 2. So, that's reasonable enough.

The 2021

In 2021 however, given about a quarter of the season, and even with the reduced hit rate this season down all the way to 8.0 per 9 IP, we'd have expected 1.65 so far. We instead have 6. That difference, +4.35 no hitters above expected, is 3.4 standard deviations from the mean. A z-score of 3.4 is not something you expect to find unless you are looking at hundreds or thousands of scenarios. From 2010-2019 for example, the z-scores ranged from -1.25 to +2.02, with a standard deviation of 1.12. We typically expect to see the range at -2 to +2 with a standard deviation of 1. So, that's why we didn't think twice in 2015, when we had 7 no hitters compared to the 3.3 we expected. Being at +3.7 no hitters above expected is a z-score of 2.0. That's not a story.

The Story

A 3.4 z-score is a story. Seeing 6 no-hitters already would argue in favor of a league allowing 7.5 hits per 9 IP, not the 8.0 we're actually seeing. So we have a conflict here. We do see 8 hits per 9 IP, but we also see 6 no-hitters. Now, while a 3.4 z-score is high, it's not equivalent to a z-score of 5 or 10. In other words, it's not astronomical.

The benefit of the season being 1/4th over is we have another 3/4ths of the season to go. To maintain a z-score of 3.4, we'd have to end the season with ~15 no-hitters. So, if what we are seeing is in fact real, then we should see another 9 no-hitters. On the other hand, if there's nothing extra-special happening, and if we just rely on the 8 hits per 9 IP, then we should see another 4-5 no-hitters this season.

And that's what the argument is going to boil down to: 4-5 more no-hitters, and it's just an early-season story. 9 more no-hitters, and we've witnessed something extra special beyond whatever Random Variation would explain.

(4) Comments • 2021/08/15 • History

Latest...

COMMENTS

Mar 07 23:52
Iterations of ABS (Automated Ball-Strike)

Feb 19 11:05
Bat-Tracking: Timing Early/Late

Feb 07 15:38
Aging Curve - Swing Speed

Feb 06 11:55
Batting Average as a proxy for fun!  Batting Average as a proxy for fun?

Feb 03 20:21
Valuation implication of straying from the .300 win% replacement level

Jan 31 13:35
Breaking into the Sports Industry WITHOUT learning to code

Jan 26 16:27
Statcast: Update to Catcher Framing

Jan 19 15:02
Young players don’t like the MLB pay scale, while veteran stars love it

Jan 14 23:32
Statcast Lab: Distance/Time Model to Catcher Throwing Out Runners

Jan 07 13:54
How can you measure pitch speed by counting frames?

Jan 02 17:43
Run Value with runners on base v bases empty

Dec 28 13:56
Run Values of Pitches: Final v Intermediate

Dec 27 13:56
Hall of Fame voting structure problem

Dec 23 19:24
What does Andre Pallante know about the platoon disadvantage that everyone else does not?

Dec 21 14:02
Run Values by Movement and Arm Angles

THREADS

May 20, 2021
Math behind no-hitters