Thursday, May 20, 2021
Math behind no-hitters
The Math
In the 10 years from 2010-2019, there have been an average of 8.7 hits per 9 IP. The hit rate (meaning the batting average if we include SF in the denominator) was .252. Which means the out rate is one minus .252 or .748. In order to get a no-hitter, you need 27 outs. And the odds of that is .748 to the power of 27. That works out to 0.0004 per game, or 4 per 10,000 games, or 1.9 per full season. There have actually been an average of 3.4 no-hitters in that time span. Why is the math off?
The Assumptions
Well, the math is not off. The assumption is off. We are assuming that each pitcher has a .748 out rate. But, some pitchers are much better and some are much worse. And some teams are a bit better and some teams are a bit worse. And when you raise that number to the power of 27, you get an exponential difference, not a simple difference. In order to "adjust" for the distribution of players, we can modify the mean out rate of .748 upwards by .016 to .764. So that becomes our "effective" out rate for the population. Raise that to the power of 27 and you get 0.0007 per game or 7 per 10,000 games, or 3.4 per full season. And that's what we've witnessed, 34 no hitters over the ten years of 2010-2019.
So, with our model in place, how did 2020 compare and how does 2021 compare? For 2020, we would have expected 1.72, and we got 2. So, that's reasonable enough.
The 2021
In 2021 however, given about a quarter of the season, and even with the reduced hit rate this season down all the way to 8.0 per 9 IP, we'd have expected 1.65 so far. We instead have 6. That difference, +4.35 no hitters above expected, is 3.4 standard deviations from the mean. A z-score of 3.4 is not something you expect to find unless you are looking at hundreds or thousands of scenarios. From 2010-2019 for example, the z-scores ranged from -1.25 to +2.02, with a standard deviation of 1.12. We typically expect to see the range at -2 to +2 with a standard deviation of 1. So, that's why we didn't think twice in 2015, when we had 7 no hitters compared to the 3.3 we expected. Being at +3.7 no hitters above expected is a z-score of 2.0. That's not a story.
The Story
A 3.4 z-score is a story. Seeing 6 no-hitters already would argue in favor of a league allowing 7.5 hits per 9 IP, not the 8.0 we're actually seeing. So we have a conflict here. We do see 8 hits per 9 IP, but we also see 6 no-hitters. Now, while a 3.4 z-score is high, it's not equivalent to a z-score of 5 or 10. In other words, it's not astronomical.
The benefit of the season being 1/4th over is we have another 3/4ths of the season to go. To maintain a z-score of 3.4, we'd have to end the season with ~15 no-hitters. So, if what we are seeing is in fact real, then we should see another 9 no-hitters. On the other hand, if there's nothing extra-special happening, and if we just rely on the 8 hits per 9 IP, then we should see another 4-5 no-hitters this season.
And that's what the argument is going to boil down to: 4-5 more no-hitters, and it's just an early-season story. 9 more no-hitters, and we've witnessed something extra special beyond whatever Random Variation would explain.
Recent comments
Older comments
Page 1 of 151 pages 1 2 3 > Last ›Complete Archive – By Category
Complete Archive – By Date
FORUM TOPICS
Jul 12 15:22 MarcelsApr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref
Apr 12 09:43 What if baseball was like survivor? You are eliminated ...
Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method)
Jul 13 10:20 How to watch great past games without spoilers