Friday, June 04, 2021
Pascal’s Run Values
A couple of years ago, I introduced a technique to analyze data in a powerful yet simple manner. I'm sure some form or other has been around long before I used it. Indeed, I was inspired by its use when I first saw former Statcast intern Tess Kolp use something like it.
One of the methods I use often is to group data based on the values. All 99.5 to 100.5 mph batted balls go into the 100 mph bucket. The 100.5 to 101.5 goes into the 101 mph bucket and so on. Simple enough and works quite well. When it doesn't work well is if you group by temperature as one example. Games at 40 F and 105 F are few and far between, while those at 70 to 75 are overwhelmingly large. Once you bin the data, if you don't track the weight (number of games, PA, etc) that make up each bucket, then you will let those few games at 40 F carry the same weight as those at 70 F.
What you can do instead is create a percentile bucket: order all the games from warmest to coldest, then put the 1% warmest game in one bucket, the next 1% warmest games in another bucket and so on. (Thank you Tess!) This way, you have 100 buckets, equally weighted (same number of games). This also works quite well. Where it becomes a mess is close to the median (50%) percentile point: the difference in temperature for the 49th and 50th and 51st bucket are going to be extremely close. Any up/down results you see around that 50th percentile mark will be subject to, essentially, total Random Variation.
The technique I introduced is a merge of these two bucketing approaches. I still order the data based on percentiles. But now, instead of creating 100 buckets of 1% of the data each, I instead create 5 buckets, where the two most extreme buckets each contain 10% of the data. The middle bucket contains 40% of the data. The two remaining bucket contains 20% of the data each. It looks like this: 10%, 20%, 40%, 20%, 10%.
Fans of Pascal's triangle may see a glimmer in there. The 4th order row of Pascal's triangle has weights of 1, 4, 6, 4, 1. If we convert those to percentages, that comes to: 6%, 25%, 38%, 25%, 6%. As you can see, my bucketing is somewhat in-line with Pascal's triangle.
There's some advantages in using the discrete values I'm using, not the least of which is simplicity of description. You also get to use a lower threshold for your samples. With a 10/20/40/20/10 scheme, if you have 10 pitches, the fastest pitch goes into one bucket, the slowest into another, the 4 middle speed into the middle and so on. If I used Pascal's directly, I'd need 16+ pitches to make it work. Anyway, I like it simple and 10/20/40/20/10 appeals to me.
As a nod to Pascal's triangle, and since I need to give a name to what I'm about to do, I'll call these Pascal's Run Values. Let's get on with it.
***
I'm going to focus solely on 2020+2021 data, including post-season. What I do is for every pitcher who threw at least 10 4-seam fastballs (henceforth called Risers) in a game gets split into a 10/20/40/20/10 distribution from fastest to slowest. By doing it game by game, I neatly bypass any park and temperature effects. That bias goes away. This is probably the most important part of this technique, that I can control for this bias. I do this for each pitcher, each game. Then I simply add things up at the group level. For Group 1, the 10% fastest pitches of each pitcher of each of their games, the average speed is 95.1mph. Group 2 is 94.4mph, followed by 93.6, 92.7, 91.8. In other words, each group's pitches is about 0.8mph higher than the next.
If a pitcher throws a pitcher faster than his usual that day, does he have better performance? The answer is: yes. Insofar as Risers are concerned, a pitcher reduces runs at a rate of 0.17 runs per 100 pitches in the fastest (Group 1) pitches. The next fastest, Group 2, reduced runs at a very similar 0.19 runs per 100 pitches. The middle group, the average fastball speed for that pitcher that day, was a drop of 0.05 runs per 100 pitches. The next group was at +0.17 runs of increase per 100 pitches. And the last group, Group 5, which represents the 10% slowest Risers thrown by each pitcher on those days, was +0.19 runs of increase.
That's alot of words for what is essentially numbers. Let me list it for you:
Risers
- -0.17: Group 1, 95.1 mph
- -0.19: Group 2, 94.4 mph
- -0.05: Group 3, 93.6 mph
- +0.17: Group 4, 92.7 mph
- +0.19: Group 5, 91.8 mph
Remember, for pitchers, reducing runs is what they are after, and so, a minus sign is good for the pitcher, while a plus sign is good for the batter.
***
I'll now go through each of the other pitch types.
Sinkers
- +0.20: Group 1, 94.1 mph
- -0.15: Group 2, 93.5 mph
- -0.15: Group 3, 92.7 mph
- -0.10: Group 4, 91.9 mph
- +0.70: Group 5, 91.1 mph
Sinkers have a clear pattern: don't throw it the very hardest, and definitely don't throw it too slow. When I look to see what is happening with the slow-sinkers, two patterns emerge: they don't break as much, and they are thrown in the Waste Region more than normal. In other words, a slow sinker is a sign of a mistake pitch (to some extent anyway).
***
Changeups
- +0.22: Group 1, 86.0 mph
- -0.50: Group 2, 85.2 mph
- -0.76: Group 3, 84.4 mph
- -0.58: Group 4, 83.4 mph
- +0.20: Group 5, 82.5 mph
This is a similar story as with the sinker: don't throw your changeup too hard or too slow. And a similar story emerged: those pitches had too much or too little break, and ended up in the Waste Region more than usual. Really, it's about knowing what your pitches can do, since you are going for location. Throw the pitcher harder or slower than your normal, or with more or less break than your normal, then the pitch will go in an unintended location.
And as you can see by the above data, you want your pitches within +/-1 mph of your usual, and definitely not outside of +/-2 mph of your usual. At least as we've seen with Sinkers and Changeups. With Risers, it's more, faster, better, stronger.
***
Cutters
- +0.62: Group 1, 90.0 mph
- -0.63: Group 2, 89.2 mph
- -0.09: Group 3, 88.2 mph
- -0.50: Group 4, 87.2 mph
- +0.32: Group 5, 86.2 mph
More of the same. A fast cutter is esssentially a slow 4-seamer, and those are bad. When I look at the pitch location of hard cutters, they are poorly thrown. So again, same deal as above: too-fast and too-slow cutters are indications of mistake pitches.
***
Sliders
- -0.36: Group 1, 86.7 mph
- -1.24: Group 2, 85.8 mph
- -0.78: Group 3, 84.9 mph
- -0.61: Group 4, 83.8 mph
- -0.15: Group 5, 82.8 mph
Second verse, same as the first. Hard sliders end up in the Waste Region a tremendous number of times, double the normal rate. Again, a sign of a mistake pitch when thrown too hard. Still, Sliders are tremendously valuable, as even a bad slider is a good pitch. Of course, it's possible that only good pitchers can throw a slider to begin with, or has the repertoire to balance out the slider. But that's another story. For this story, it's the same as the others: consistency is the key.
***
Curveballs
- -0.15: Group 1, 81.2 mph
- -0.63: Group 2, 80.4 mph
- -0.84: Group 3, 79.3 mph
- -0.37: Group 4, 78.3 mph
- +0.69: Group 5, 77.2 mph
How many verses are we up to? Once again, you want your curveballs to be thrown at a consistent speed. Not too fast, and certainly not too slow.
***
If we equally weight all our non-4-seamers, we get this:
Simple average, non-4-seamers
- +0.11: Group 1
- -0.63: Group 2
- -0.53: Group 3
- -0.43: Group 4
- +0.36: Group 5
It is an unmistakable pattern. You want to be consistent, meaning around 1 mph of your personal norm, with slightly faster better than slightly slower. You definitely don't want to go too fast, and certainly not too slow, as those are likely indicators that you are not in full control of the pitch. Except for 4-seam fastballs: those you want to throw as hard as you can.
***
Next? I will look at horizontal break, vertical break, total break, and... something I have never done before: Active Spin (or its complement, Gyro Spin). In other words, how much Gyro do you want? Is it better to have too much or not enough? Or is consistency the key? I have no idea (yet). But we'll learn the answers to that together, next time.
Recent comments
Older comments
Page 1 of 152 pages 1 2 3 > Last ›Complete Archive – By Category
Complete Archive – By Date
FORUM TOPICS
Jul 12 15:22 MarcelsApr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref
Apr 12 09:43 What if baseball was like survivor? You are eliminated ...
Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method)
Jul 13 10:20 How to watch great past games without spoilers