Earlier this week we introduced the run estimator SIERA, providing a general summary of its purpose as well as the evolution of its development. Today, in Part 3, our focus will shift to the quantitative side of the metric, offering a detailed look at the data used to derive the formula as well as specifics pertaining to the regression analysis techniques used. The transparency should provide a better understanding of the integrity of such a process as well as a few insights into the SIERA-laden approach towards pitcher valuations.
The Data
All data used throughout this process, be it the calculation of SIERA or the various other comparative estimators, came from Retrosheet, a monumental achievement in the world of data without which several advancements in the field would not exist. The first step involved extracting seasonal tallies from the main events table, with statistics being grouped by pitcher, team, and year. This way, a pitcher with stints on various clubs throughout the same season would carry a different entry for each; Cliff Lee as both a Phillies and Indians pitcher last season. Next, using the Lahman database, the pitching park factor was added to each row in the table.
Park-adjusted ERA was then calculated, though only half of the park factor was applied to the individual pitchers given that only half of a team’s games are played at the home stadium. If a pitcher ended up with a PPF of 105, instead of taking 95 percent of his ERA, 97.5 percent was taken, equating to one-half of the difference between the actual park he called home and one considered to be neutral. With the adjustment applied to raw ERA, the next issue to bypass involved batted ball reliability.
While Retrosheet provides a fantastic wealth of information, batted ball data is realistically only usable from 2003-present. The major reason for this involves how balls put in play were scored, as the processes implemented have not been consistent. Before 2003, batted balls were only recorded on outs, meaning that a ground ball single through the third base hole counted as a single while a ground out to the second baseman went down as a grounder. Both are ground balls, but this rather vast issue precludes the usage of batted balls prior to that season.
Only data from 2003-09 moved onto the next round given this restriction. With that table in place, the QERA formula was unfoiled and the nine emerging terms were calculated for each row in the database table. The data was then ready for further processing and rigorous study.
The Results
SIERA was first estimated with 10 parameters: an intercept and the nine aforementioned terms that surface once QERA is unfoiled, which involved regressing park-adjusted ERA on all nine terms. The results can be seen below:
VARIABLE COEF. T-STAT P-STAT Constant 6.368 16.97 0.000 SO/PA -18.341 -7.10 0.000 BB/PA 9.471 2.00 0.046 (GB-FB-PU)/PA -1.807 -1.60 0.110 (SO/PA)^2 10.254 1.98 0.048 (BB/PA)^2 6.833 0.33 0.742 ((GB-FB-PU)/PA)^2 -7.063 -3.93 0.000 ((GB-FB-PU)/PA)*(SO/PA) 9.661 2.38 0.017 ((GB-FB-PU)/PA)*(BB/PA) -3.208 -0.44 0.661 (BB/PA)*(SO/PA) 2.828 0.18 0.857
Before getting into what the data originally said, a description of the columns is in order. The first column lists the variable in question while the coefficients were estimated by the regression. The t-statistic describes how many standard deviations from zero the coefficient strayed and the p-statistic tells us that, if the effect of the variable on park-adjusted ERA were actually zero, what the probability is a coefficient that far from zero would surface.
It is commonly accepted that p-stats less than .05 or .10 are probably different from zero. Unfortunately, reliable data for balls in play only exists from 2003-09, which means that we are unable to get many coefficients that make sense to be significant. Our intuition helped to build this model, with an understanding that as pitchers get back on the mound and throw some more games even more accurate results can be had. Note that the above table does not show the final formula for SIERA, but rather the original estimation using the entire formula for QERA regressed on park-adjusted ERA. Also note that the data used to build the table above originally came from 2003-08, not 2009; the latter year was excluded for the purpose of eventually testing a regression on an outside element. However, to contrast it with the table below, the table above includes 2009 data as well even though our original tests left out 2009 data for honest testing procedures.
What immediately stands out is that the quadratic term for walks is not significant, the .74 p-stat indicates that there is a 74-percent chance that you would get a value further from zero than 6.833 if the true quadratic effect of walks on ERA was zero. The conclusion: the effect of walks on ERA is linear but perhaps with interactions with strikeouts or ground balls. It is also evident that the effect of strikeouts and walks is non-existent as well. This seems plausible, seeing as there is no reason to assume walks increase ERA more for high strikeout pitchers as opposed to those with low whiff totals.
Two quadratic terms are significant as is an interaction term. The interaction between walks and ground balls could have been dropped, but intuition chimed in and kept it afloat because the significance of the interaction of strikeouts and ground balls forces honesty and requires the presence of the former interaction. The reason this interaction is believed to be clinically significant is that pitchers who strike more batters out allow fewer singles and need fewer double plays. This is true for walks as well.
Removing the other two insignificant terms sends the walk and ground ball interaction term closer to significant, but still far from it. It is our belief that including this interaction gives a more accurate prediction of a pitcher’s skill level and that the reason that the coefficient is insignificant is that the sample size is too small. Some of the other effects are even crisper when the regression is analyzed with the two insignificant terms removed:
VARIABLE COEF. T-STAT P-STAT Constant 6.262 28.07 0.000 SO/PA -18.055 -8.39 0.000 BB/PA 11.292 12.81 0.000 (GB-FB-PU)/PA -1.721 -1.57 0.116 (SO/PA)^2 10.169 1.97 0.049 ((GB-FB-PU)/PA)^2 -7.069 -3.94 0.000 ((GB-FB-PU)/PA)*(SO/PA) 9.561 2.38 0.017 ((GB-FB-PU)/PA)*(BB/PA) -4.027 -0.58 0.563
Four terms are worthy of further explanation as they are significant, or close enough to significant, like in the case of the linear term in (GB-FB-PU)/PA since its square proved to be significant. Each will be explained separately:
-
(SO/PA)^2 has a significant and positive coefficient, even though the linear SO/PA has a significant and negative coefficient. Essentially, this means that although increasing strikeout rate lowers ERA, whiffing more hitters has a diminishing effect on run prevention. If you take the derivative of SIERA with respect to SO/PA (the amount that SIERA changes relative to the change in SO/PA at a given level), you get the following:
-18.054 + 20.337*SO/PA + 9.561*(GB-FB-PU)/PA
Ignoring the third term for now, but for a given (GB-FB-PU)/PA of zero (to simplify calculations), an increase of strikeouts from 0-1 percent decreases ERA by about 0.179; an increase in strikeouts from 10 percent to 11 percent decreases ERA by about 0.138; and an increase in strikeouts from 20 percent to 21 percent decreases ERA by .097. Basically, strikeouts are more useful with runners on base, as the more whiffs one tallies, the fewer runners that reach base. Strikeouts have a gradually diminishing effect on run prevention as someone who strikes out 90 percent of hitters he faces is not doing much harm by allowing a few more balls in play.
-
((GB-FB-PU)/PA)^2 has a significant negative coefficient, adding to the negative coefficient on (GB-FB-PU)/PA. The more ground balls a pitcher allows, the more he will benefit from even more worm beaters, due to the fact that ground balls are often singles. Due to this, there will be more runners to double up. The derivative of (GB-FB-PU)/PA can be seen below:
-1.721 – 14.138*(GB-FB-PU)/PA + 9.561*SO/PA – 4.027*BB/PA
Using league average strikeout and walk rates, a (GB-FB-PU)/PA from 0-0.05 would drop ERA by 0.104; from 0.05-0.10 would drop ERA by 0.139; and from .10-.15 would drop ERA by 0.174. The more ground balls a pitcher gets, the more he will benefit by getting more of them.
The interaction terms that follow are probably even more important, and represent the major contributions of SIERA to ERA estimation.
-
(BB/PA)*((GB-FB-PU)/PA) has an insignificant but negative coefficient. This suggests that pitchers with higher walk rates will prevent more runs by generating ground balls than pitchers with low walk rates, but it does not offer statistical proof. However, for reasons suggested above, the significance of the interaction between strikeouts and ground balls implies that this is probably true as well, and that only sample size is holding us back. There are two reasons toexplain the negative effect.
First, pitchers who put more runners on first will get more double plays from generating ground balls than pitchers who do not put many hitters on first base.
Second, pitchers who allow more fly balls allow more home runs. These are more likely to be solo home runs if the pitcher does not give out many free bases on balls so they will not be as damaging. Getting ground balls is even more important for pitchers with high walk rates since they can avoid multi-run dingers. It seems particularly inaccurate that FIP puts a coefficient of 13 on HR/IP for all pitchers, regardless of their walking exploits. Solo shots do something different to ERA than grand slams.
-
(SO/PA)*((GB-FB-PU)/PA) has a significant positive coefficient. Pitchers who strike out a lot of hitters benefit from ground balls less than pitchers who allow many hitters to put the ball in play.
This follows the same logic as the walk/ground-ball interaction, since pitchers who strike out fewer hitters allow more balls in play. This leads to more runners reaching base who will conceivably be doubled up ground balls and allowed to score on multi-run fly ball blasts.
Thus, these four points have shown us that strikeouts have a diminishing return as you accrue more of them, ground balls have an increasing return the higher your tally, and ground balls are more beneficial to pitchers who allow more walks or balls in play, especially because fly balls are more detrimental to pitchers who allow more runners on base.
How beneficial are these results? In Part 4 of our introductory series on SIERA, the estimator will be put to the test at both predicting same year ERA better than other estimators that use similar statistics and at predicting future year ERA than any other estimator out there.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
There certainly was some indication that IBB led to fewer runs, particularly with respect to the ground ball term, but at this sample size we figured it was probably best not to do something that could be construed as data mining. We also felt that the gains from distinguishing between BB & IBB seemed negligible anyway. That is a good point, though. Thanks for highlighting it.
Honestly, though, great work. This is neat stuff.
Verbose means "wordy." Silver's description wasn't wordy. A simple edit will do here: "... formula was first described by Nate Silver...".
Have you done multiple train-test experiments on your data set with and without this parameter? Not only just train on 2003-2008 test on 2009, but also 2003-2005 + 2007-2009 and test on 2006, etc. etc. and see if BB*GB ever has a significant predictive value?
We tested it on individual years and sets of years and the coefficient jumped all over the place from much more negative even to more positive. The -4.027 number is probably a pretty good approximation, though.
If we left it out and re-ran the regression, it would move the SIERAs by no more than .10 runs, which is pretty much the magnitude that you would expect the term to be. It's important, but it's not going to show as significant in this sample size.
Thanks for the suggestion though. Definitely was an important thing to check.
The other possibility in cases like this is some kind of multicollinearity -- that there's another term that is sufficiently correlated with GB*BB that you can't interpret their coefficients independently. Did you check for that?
I don't know what else it would be correlated with that would get in the way. I doubt it though. If you think about the implications are of high walk rates and high ground ball rates, you'd think it adds a few double plays a year to have both skills, which is exactly what this type of coefficient around -4.0 would suggest.
The main "cost" of including this irrelevant variable is parsimony. While you are arguing for logical completeness (and keeping open the possibility that the term will matter when you extend the data set), it just makes your equation a bit busier than it needs to be.
A variable that really ought to be signficant, but isn't, is a possible warning sign of multicollinearity. There's some pretty good discussion and advice at
http://www.nd.edu/~rwilliam/stats2/l11.pdf
http://www.baseball-reference.com/about/parkadjust.shtml.
Biggest unanswered question: what's the minimum BFP for inclusion? I've found no loss in year-to-year K/PA correlation down to 260. If you didn't go that low, you can increase your sample size.
My biggest disappointment is that you started with ERA. Granted, that's the stat we look at. But there's no good reason to ignore the (very accurately) quantifiable errors in RA caused by good or bad inherited runner support. You have that data here ("Fair RA," IIRC).
And you probably should have wrestled with R vs ER. Personally, I believe in keeping track of UER but doing it exactly the way you adjust for inherited runners -- the pitcher is credited with the average change in Run Expectancy caused by the error rather than the number of UER that actually end up scoring. (This actually only works for ROE; for errors leading only to base advances you need a "subsequently rendered moot" adjustment, ao it does get tricky.) I bet there's a correlation of GB% to errors and hence UER ... you may have been better off regressing to Fair RA (adjusted only for inherited runners) with a separate term estimating ER/R. Or regressed to Fair RA and used a fixed ER/R, which is just using RA but scaling it to look like ERA.
Finally, I've never kept a term with p = .56 no matter how strongly I felt it deserved to stay. That is not trending towards significance and I think it's wishful thinking to expect it to get there with a bigger sample. Although I am at a loss to explain why it's not showing up. I would experiment with taking out the straight, non-squared GB term and see if that helps this one.
We used 40 IP as a minimum.
We checked RA by the same method (though not FRA) and got basically the same coefficients with the intercept being about 0.4-0.5 higher, so since people are familiar with ERA this is easiest to do. Fair RA is an intriguing idea, though.
The reason we kept the GB*BB term with p=.56 is that (a) we don't think the effect is bigger than something around -4, and it would take 20 years of batted ball data for it to be significant, and (b) the exclusion of it, while re-running the regression and generating new SIERAs would not change anybody's ERA by 0.10. It's just too small of a difference to make a fuss about.
You really should try removing the straight GB term (the rationale being that you've already got it squared and there's no logic that says it needs to be fully quadratic) and see what happens with the GB*BB one. I'm just personally curious because I've done so many of these multiple regressions and I've seen a lot of funky things happen when you take out one term.
*It's worth noting, though, that increasing the sample size with noisy data can give you worse (less significant) regression terms.
Think of it as a regression showing an equation of:
SIERA = a + (b + c*SO_PA)^2 + d*BB_PA + (e + f*GB_FB)^2 + g*GB_FB*SO_PA + h*GB_FB*BB_PA. This way the effect of GB_FB is minimized at a value determined by where f = -e which can move rather than where f = 0. It's a more general assumption to leave it in there even if it cuts the GB*BB term in half and makes it appear insignificant.
In general, I don't think there's any rationale for keeping a term as both linear and squared if the squared term is significant and leaving out the linear term improves the overall regression. In this case, the interactions of GB with itself and with K and BB rates appear to be so important that if you include them you don't need to include the term directly. That may make the seeming illogic of not having the term directly more palatable to consider.
The one problem I can see in general is that a pitcher with GB = FB + PU is not at average pitcher and yet he's the baseline that's determining the constant (i.e., he's contributing your unknown variable "e" to it). I would have begun by normalizing all the data, so that GB_FB = 0 meant a pitcher with an average rate. This would be the best solution to the problem you're worrying about, since by definition the effect of GB can be regarded as minimized for an average pitcher. Then you run the regressions and you convert the coefficients to useful ones by reversing the normalization.
Second random comment, but has anyone ever looked at foul balls or swinging strikes as a % of strikes to hone predictions, or does that stuff all flow neatly into K rate?
Thanks.
I don't know about foul balls and swinging strikes as a percent of strikes, but I suspect it would be interesting to look at. If anybody has, Russell Carleton has-- he's the foul ball expert.
I would also be concerned about correlation in the error term across years for a given player, for obvious reasons, although that probably isn't something easily controlled for.
This is still a pretty impressive piece of work, so bravo to the authors.
Why would the error term being correlated across years for pitchers matter as much for what we're doing? I agree that's definitely true, especially because it includes team defense, but still I'm not sure that is a big deal.
Thanks for your comments and approach. It's interesting to see how you frame it.
As for the autocorrelation, it can bias your results and lower efficiency. Basically I'd be worried that, for example, the coefficient on GB% would be biased high if it's picking up on good infield defense, since you'd expect teams with good infield defenses to favor ground ball pitchers. I guess that wouldn't be a problem for forecasting unless that tendency is changing over time though. Your standard errors would be biased either way, unfortunately.
I suppose a good part of my concern is run-of-the-mill applied economist paranoia. Obviously here the important thing is the ability to forecast, not on the interpretation of the parameters.