Forecasting Pitchers - Adjacent Seasons© Tangotiger
Process
The following table presents the performance of pitchers, year-to-year, based on age, from 1955 to 2003. It is limited to pitchers with at least 250 PA in the same league, year-to-year. Each pitcher in the sample is equally weighted. The numbers represent the pitcher's ratio, relative to the league's ratio, in various component categories.
Since that probably made no sense, let's take an example. Rick Wise, at the age of 21, in 1966, faced 413 batters, and hit 3 of them. (It's actually 416, but I removed the 3 batters he IBB.) That's a rate of .0073. In the AL in 1966, that rate was .0053. I also computed Wise's rate at the age of 22 in 1967.
I repeated this step for all pitchers aged 21/22. I took the simple average of all the pitchers and league rates. This gave me a HBP rate of .0058 at age 21 and .0059 at age 22. The league rates were .0057 and .0058. As you can see, these age 21/22 pitcher's hit batters at a slightly higher rate than the overall league rate.
I then turned these rates into ratios (.0058 became .0058/(1-.0058) = .00583, and .0057 became .00573). That's the ratio of hit batters to batters not hit.
Then, you normalize to the league. So, .00583/.00573 = 1.017 for age 21, and 1.017 for age 22. If I went to more decimal places, the age 22 would have been slightly higher.
Finally, I take the normalized age 22 ratio and divide it by the normalized age 21 ratio, and that gives you 1.004. What this number represents is that a pitcher is more likely to hit a batter at age 22, than age 21.
Year-to-year change in performance
Here is the result of all that for
So, what does this show us? According to this list, a pitcher's strike out rate goes down for every year-to-year pair. His age 22 K ratio is 98.8% of what it was at age 21. For every year, it's the same thing. A pitcher's hit allowed per ball in park goes up for every year-to-year pair.
The next thing to do is to "chain" them. If from age 21 to 22 the K ratio is .988 and from age 22 to 23 it's .974, then from 21 to 23 it's .988 x.974 = .962. Got that?
Year-to-year performances, chained
Here then is the result of all that:
According to this table, a pitcher peaks at age 27 for Hit batters, 29 for walks, 21 for strike outs, 22 for HR, and 20 for hits on balls in play. The average would be around 23 or 24.
What's the problem?
Selective sampling. Selective sampling and lack of regression towards the mean. The pitchers who are allowed to have back-to-back years of 250 PA have, on average, a better than league performance in year X. Performance does not equal ability. Observed performance is equal to the underlying true talent plus luck. And the better your performance, the more likely it's good luck and not bad luck (on average and for a large enough group of players).
So, what we need is to regress each year X performance, before comparing it to year X+1. But, how much to regress? In another study, I concluded that the regression for pitchers with 650 PA (the average of these year-to-year pairs) was about 10% for K, 20% for BB, 50% for HR, H, and HBP. So, let's see what happens when I use these.
Year to year performances, chained, and regressed
Whoah, big difference. Now, a pitcher peaks in hit batters and walks at age 39! This is very interesting, since I have hitters also peaking with walks in his late 30s. That is, both hitters and pitchers get much smarter as they age, such that pitchers reduce their walk rates into their late 30s (presumably because they can't overpower hitters) and hitters increase their walk rates into their late 30s (presumably because they can't overpower pitchers).
Even with regression, a pitcher's K, HR and hits ratios tops off at age 21 or 22.
What if my regression rates are wrong? Well, that's definitely a possibility. How much you regress has a huge impact on the whole chaining process. What kind of regression rate would I need to make the K rate peak a little later? If I force in a regression rate of 30% for K (instead of 10%), and 75% for HR and hits (instead of 50%), here's what we get:
Year to year performances, chained, and regressed (part 2)
Now what do we have? Hit batters and walks remain with a peak of age 39. K rates peak at age 25 (and that's as high as I can get it). HR rates peak at age 21, but are fairly static between age 23 and 35. Hits allowed on balls in park are essentially completely flat (i.e., pitcher's ability does not change).
Conclusion
Be careful!
Seriously, how the selective sampling issues are dealt with will severely impact the results.
You can also consider taking pitchers with at least 5 consecutive years of 250 PAs. But, there's selective sampling there too. If a pitcher managed to get that much playing time, then chances are his performances did not deteriorate as much as others did. If you took pitchers with at least 15 years, what do you think we'd get? Exactly. Rather flat aging processes. However, what these different selections will give you is sort of a "max". If for every different selection criteria you have, the K rate tops off at age 25, then you know what? Chances are, that's the peak.
Number of prior MLB PAs might also affect the aging process.
|