Tangotiger Blog

Friday, June 04, 2021

Pascal’s Run Values

By Tangotiger

A couple of years ago, I introduced a technique to analyze data in a powerful yet simple manner. I'm sure some form or other has been around long before I used it. Indeed, I was inspired by its use when I first saw former Statcast intern Tess Kolp use something like it.

One of the methods I use often is to group data based on the values. All 99.5 to 100.5 mph batted balls go into the 100 mph bucket. The 100.5 to 101.5 goes into the 101 mph bucket and so on. Simple enough and works quite well. When it doesn't work well is if you group by temperature as one example. Games at 40 F and 105 F are few and far between, while those at 70 to 75 are overwhelmingly large. Once you bin the data, if you don't track the weight (number of games, PA, etc) that make up each bucket, then you will let those few games at 40 F carry the same weight as those at 70 F.

What you can do instead is create a percentile bucket: order all the games from warmest to coldest, then put the 1% warmest game in one bucket, the next 1% warmest games in another bucket and so on. (Thank you Tess!) This way, you have 100 buckets, equally weighted (same number of games). This also works quite well. Where it becomes a mess is close to the median (50%) percentile point: the difference in temperature for the 49th and 50th and 51st bucket are going to be extremely close. Any up/down results you see around that 50th percentile mark will be subject to, essentially, total Random Variation.

The technique I introduced is a merge of these two bucketing approaches. I still order the data based on percentiles. But now, instead of creating 100 buckets of 1% of the data each, I instead create 5 buckets, where the two most extreme buckets each contain 10% of the data. The middle bucket contains 40% of the data. The two remaining bucket contains 20% of the data each. It looks like this: 10%, 20%, 40%, 20%, 10%.

Fans of Pascal's triangle may see a glimmer in there. The 4th order row of Pascal's triangle has weights of 1, 4, 6, 4, 1. If we convert those to percentages, that comes to: 6%, 25%, 38%, 25%, 6%. As you can see, my bucketing is somewhat in-line with Pascal's triangle.

There's some advantages in using the discrete values I'm using, not the least of which is simplicity of description. You also get to use a lower threshold for your samples. With a 10/20/40/20/10 scheme, if you have 10 pitches, the fastest pitch goes into one bucket, the slowest into another, the 4 middle speed into the middle and so on. If I used Pascal's directly, I'd need 16+ pitches to make it work. Anyway, I like it simple and 10/20/40/20/10 appeals to me.

As a nod to Pascal's triangle, and since I need to give a name to what I'm about to do, I'll call these Pascal's Run Values. Let's get on with it.

***

I'm going to focus solely on 2020+2021 data, including post-season. What I do is for every pitcher who threw at least 10 4-seam fastballs (henceforth called Risers) in a game gets split into a 10/20/40/20/10 distribution from fastest to slowest. By doing it game by game, I neatly bypass any park and temperature effects. That bias goes away. This is probably the most important part of this technique, that I can control for this bias. I do this for each pitcher, each game. Then I simply add things up at the group level. For Group 1, the 10% fastest pitches of each pitcher of each of their games, the average speed is 95.1mph. Group 2 is 94.4mph, followed by 93.6, 92.7, 91.8. In other words, each group's pitches is about 0.8mph higher than the next.

If a pitcher throws a pitcher faster than his usual that day, does he have better performance? The answer is: yes. Insofar as Risers are concerned, a pitcher reduces runs at a rate of 0.17 runs per 100 pitches in the fastest (Group 1) pitches. The next fastest, Group 2, reduced runs at a very similar 0.19 runs per 100 pitches. The middle group, the average fastball speed for that pitcher that day, was a drop of 0.05 runs per 100 pitches. The next group was at +0.17 runs of increase per 100 pitches. And the last group, Group 5, which represents the 10% slowest Risers thrown by each pitcher on those days, was +0.19 runs of increase.

That's alot of words for what is essentially numbers. Let me list it for you:

Risers

-0.17: Group 1, 95.1 mph
-0.19: Group 2, 94.4 mph
-0.05: Group 3, 93.6 mph
+0.17: Group 4, 92.7 mph
+0.19: Group 5, 91.8 mph

Remember, for pitchers, reducing runs is what they are after, and so, a minus sign is good for the pitcher, while a plus sign is good for the batter.

***

I'll now go through each of the other pitch types.

Sinkers

+0.20: Group 1, 94.1 mph
-0.15: Group 2, 93.5 mph
-0.15: Group 3, 92.7 mph
-0.10: Group 4, 91.9 mph
+0.70: Group 5, 91.1 mph

Sinkers have a clear pattern: don't throw it the very hardest, and definitely don't throw it too slow. When I look to see what is happening with the slow-sinkers, two patterns emerge: they don't break as much, and they are thrown in the Waste Region more than normal. In other words, a slow sinker is a sign of a mistake pitch (to some extent anyway).

***

Changeups

+0.22: Group 1, 86.0 mph
-0.50: Group 2, 85.2 mph
-0.76: Group 3, 84.4 mph
-0.58: Group 4, 83.4 mph
+0.20: Group 5, 82.5 mph

This is a similar story as with the sinker: don't throw your changeup too hard or too slow. And a similar story emerged: those pitches had too much or too little break, and ended up in the Waste Region more than usual. Really, it's about knowing what your pitches can do, since you are going for location. Throw the pitcher harder or slower than your normal, or with more or less break than your normal, then the pitch will go in an unintended location.

And as you can see by the above data, you want your pitches within +/-1 mph of your usual, and definitely not outside of +/-2 mph of your usual. At least as we've seen with Sinkers and Changeups. With Risers, it's more, faster, better, stronger.

***

Cutters

+0.62: Group 1, 90.0 mph
-0.63: Group 2, 89.2 mph
-0.09: Group 3, 88.2 mph
-0.50: Group 4, 87.2 mph
+0.32: Group 5, 86.2 mph

More of the same. A fast cutter is esssentially a slow 4-seamer, and those are bad. When I look at the pitch location of hard cutters, they are poorly thrown. So again, same deal as above: too-fast and too-slow cutters are indications of mistake pitches.

***

Sliders

-0.36: Group 1, 86.7 mph
-1.24: Group 2, 85.8 mph
-0.78: Group 3, 84.9 mph
-0.61: Group 4, 83.8 mph
-0.15: Group 5, 82.8 mph

Second verse, same as the first. Hard sliders end up in the Waste Region a tremendous number of times, double the normal rate. Again, a sign of a mistake pitch when thrown too hard. Still, Sliders are tremendously valuable, as even a bad slider is a good pitch. Of course, it's possible that only good pitchers can throw a slider to begin with, or has the repertoire to balance out the slider. But that's another story. For this story, it's the same as the others: consistency is the key.

***

Curveballs

-0.15: Group 1, 81.2 mph
-0.63: Group 2, 80.4 mph
-0.84: Group 3, 79.3 mph
-0.37: Group 4, 78.3 mph
+0.69: Group 5, 77.2 mph

How many verses are we up to? Once again, you want your curveballs to be thrown at a consistent speed. Not too fast, and certainly not too slow.

***

If we equally weight all our non-4-seamers, we get this:

Simple average, non-4-seamers

+0.11: Group 1
-0.63: Group 2
-0.53: Group 3
-0.43: Group 4
+0.36: Group 5

It is an unmistakable pattern. You want to be consistent, meaning around 1 mph of your personal norm, with slightly faster better than slightly slower. You definitely don't want to go too fast, and certainly not too slow, as those are likely indicators that you are not in full control of the pitch. Except for 4-seam fastballs: those you want to throw as hard as you can.

***

Next? I will look at horizontal break, vertical break, total break, and... something I have never done before: Active Spin (or its complement, Gyro Spin). In other words, how much Gyro do you want? Is it better to have too much or not enough? Or is consistency the key? I have no idea (yet). But we'll learn the answers to that together, next time.

(8) Comments • 2021/06/14 • Ball_Tracking

Feb 19 11:05		Bat-Tracking: Timing Early/Late
Feb 07 15:38		Aging Curve - Swing Speed
Feb 06 11:55		Batting Average as a proxy for fun! Batting Average as a proxy for fun?
Feb 03 20:21		Valuation implication of straying from the .300 win% replacement level
Jan 31 13:35		Breaking into the Sports Industry WITHOUT learning to code
Jan 26 16:27		Statcast: Update to Catcher Framing
Jan 19 15:02		Young players don’t like the MLB pay scale, while veteran stars love it
Jan 14 23:32		Statcast Lab: Distance/Time Model to Catcher Throwing Out Runners
Jan 07 13:54		How can you measure pitch speed by counting frames?
Jan 02 17:43		Run Value with runners on base v bases empty
Dec 28 13:56		Run Values of Pitches: Final v Intermediate
Dec 27 13:56		Hall of Fame voting structure problem
Dec 23 19:24		What does Andre Pallante know about the platoon disadvantage that everyone else does not?
Dec 21 14:02		Run Values by Movement and Arm Angles
Dec 18 20:45		Should a batter have a steeper or flatter swing (part 2)?
Dec 18 16:19		Art and Science of WAR: Deriving the zero-baseline, historically
Dec 14 23:50		Art and Science of WAR: Positional Adjustments
Dec 10 12:49		Fine and Notso-Fine Starts
Dec 06 21:59		To login to this site, and register an account (part 2)
Dec 03 23:26		The One-Hour Hall of Fame Points System
Dec 02 08:47		DH and PH Batting Human Adjustment
Nov 23 14:15		Layered wOBAcon
Nov 22 22:15		Cy Young Predictor 2024
Oct 28 17:25		Layered Hit Probability breakdown
Oct 15 13:42		Binomial fun: Best-of-3-all-home is equivalent to traditional Best-of-X where X is
Older comments Page 1 of 152 pages 1 2 3 > Last ›
Complete Archive – By Category Complete Archive – By Date 2025 Jan Feb 2024 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2023 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2022 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2021 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2020 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2019 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2017 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2016 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2015 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2014 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2013 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec FORUM TOPICS Jul 12 15:22 Marcels Apr 16 14:31 Pitch Count Estimators Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS Jan 29 09:41 NFL Overtime Idea Jan 22 14:48 Weighting Years for NFL Player Projections Jan 21 09:18 positional runs in pythagenpat Oct 20 15:57 DRS: FG vs. BB-Ref Apr 12 09:43 What if baseball was like survivor? You are eliminated ... Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method) Jul 13 10:20 How to watch great past games without spoilers

Tangotiger Blog

Friday, June 04, 2021

Pascal’s Run Values

Recent comments

Older comments

Complete Archive – By Category

Complete Archive – By Date

FORUM TOPICS

Latest...