Part 1 of this series marked the introduction of Skill-Interactive Earned Run Average, or SIERA, an ERA estimator that more accurately gauges the run-prevention skills of a pitcher relative to his controllable skills. Part 1 focused on the introductory aspects, similarly to going over a syllabus on the first day of class, but today we'll recap the steps that led to SIERA’s creation. One of the major reasons for SIERA’s existence is that prior estimators broke plenty of ground. In this respect, SIERA represents another evolutionary step in the process of removing the effects of defense on pitcher statistics that came into play when Henry Chadwick conjured up the earned run average metric over a century ago.
Chadwick’s metric proved popular at the time and remains one of the most frequently cited tools for determining the quality of pitchers. Back at the turn of the 21st century, however, Voros McCracken shocked the nation with seminal research on the roles of defense and luck in ERA, finding that hurlers exhibited little persistence in their BABIP (batting average on balls in play), and concluding that more went into Chadwick’s toy than what the pitcher could control. This led to the invention of FIP, or Fielding Independent Pitching, which estimates ERA from the three statistics McCracken found to be persistent—walks, strikeouts and home runs. FIP essentially marked the beginning of approximating ERA through defensive independence, and can be calculated as: FIP = 3.20 + (3*BB – 2*K + 13*HR)/IP, where the 3.20 is a constant contingent upon the league and year, used to place the estimator on the ERA scale.
It is very true that FIP will provide a better estimate of a pitcher’s skill level than his ERA, because the latter is open to bloop hits or nabbed line drives. Bloops and other unfortunate events can cause BABIP to fluctuate while those hits or lack thereof can aggregate to create a rift between measured success and actual talent. The problem here deals with the lack of persistence in BABIP as well as in the rates of home runs per fly ball, as intra-class correlations over the span of 2003-09 show that HR/FB, no matter how one chooses to calculate it (out of outfield flies or total flies), does not produce an r greater than 0.15—and home runs per outfield fly ball net of team home runs per outfield fly ball (to control for park effects) only leaves an ICC of 0.084. FIP attempts to correct for BABIP luck but fails to correct for the luck inherent in HR/FB, perpetually over- or underrating certain types of pitchers in the process.
The natural way to correct for some of this home run luck is to adjust FIP through the use of expected, not actual, dingers. The expected tally is calculated by multiplying the league average rate of home runs per outfield flies, as opposed to also lumping in popups, by the total number of outfield flies. These corrections comprise xFIP, created by The Hardball Times and currently housed at Fangraphs. If the league average HR/FB is 18 percent and a pitcher allows 85 outfield fly balls, his expected home runs tally would equal 15.3. If he actually allowed 23 home runs, then his xFIP would be lower than his unadjusted FIP, as the poor luck with home runs would be expected to even out in the next year.
Nate Silver introduced QERA to Baseball Prospectus in 2006 using a similar approach, while acknowledging that run scoring is non-linear because more base runners leads to more runs allowed. QERA used a quadratic form that incorporated walk, strikeout, and ground ball rates, keeping constant the usage of walks and strikeouts but more accurately modeling home runs surrendered through the ground ball rate. Silver also made another improvement by looking at walk and strikeout rates per plate appearance, instead of per nine innings. The reason is quite intuitive as a lower BABIP will lead to higher innings pitched totals and lower K/9, BB/9, and HR/9 rates even though it is not something that DIPS credits as in the pitcher’s control.
Unfortunately, the adjustment methodology was still flawed, as QERA used the percentage of ground balls per ball in play, instead of per plate appearance. This has been criticized due to the idea of using common denominators in a formula. Walks and strikeouts were per plate appearance, so why weren’t grounders treated the same way? The criticism is certainly valid, since pitchers who allow fewer balls in play will gain less by having a higher percentage of grounders, while those who allow more will see a commensurate gain.
Consider a pitcher who strikes out or walks half of the hitters he faces. Why should his ground ball rate per ball in play be as significant as another pitcher who neither strikes out nor walks anybody at all? SIERA corrects this issue by using a variable suggested at the Inside the Book blog: (GB-(FB+PU))/PA. This variable corrects for the common denominator problem and simultaneously treats line drives neutrally. The latter fix is critical given the lack of persistence of liners – the individual rate, isolated from team, produces a .007 ICC—and looks at the extent to which grounders exceed or fall short of the sum of outfield flies and popups.
Reverting to QERA for a minute, another advantage it has over competitors is that it implicitly considers non-linear returns to each term (K%, BB%, GB%), and interactions between those terms. A pitcher’s walk rate impacts his QERA at an increasing rate as they begin to advance batters who have already walked, and pitchers who walk a great deal of hitters may benefit more from grounders than their strike-zone stingy compadres.
The formula for QERA is:
… where a = 2.69, b = -0.66, c = 3.88, and d = -3.4. Un-foiled, this means that the following is also true:
QERA = a^2 + b^2*GB%^2 + a*b*GB% + c^2*BB%^2 + a*c*BB% + d^2*SO%^2 + a*d*SO% + b*c*GB%*BB% + b*d*GB%*SO% + c*d*BB%*SO%
QERA considers that the effect of BB% on ERA may be non-linear, and that if, for example, c^2 is large, walks may increase ERA at an increasing rate; jumping from 4-8 percent may not hurt ERA as much as a jump from 8-12 percent. QERA also allows for ground balls to be more beneficial for pitchers who walk a greater percentage of hitters, as the term b*c*GB%*BB% is negative; increasing your ground ball percent from 40 to 45 may do more for a pitcher who has a high walk rate than for one who walks fewer hitters.
Unfortunately, this functional form is very limiting. The three components are not all quadratic, as while the rate of whiffs and grounders is, the rate of walks is not. Since a squared term has to be positive, then b^2, c^2, and d^2 are all positive, but our results show that the coefficient in place of b^2 should be negative since more ground balls can drive down ERA at an increasing rate.
Another limit of this functional form is that the interaction terms (e.g. b*c for the product of ground ball and walk rates) are limited by what numbers for a, b, c, and d are the most realistic for the earlier terms in the equation. The term in place of c*d should probably be zero, as pitchers who walk a great deal of hitters do not necessarily benefit any more from strikeouts than pitchers who walk next to nobody. If c or d were zero, QERA would predict that strikeout or walk rates have nothing to do with run prevention, a clearly false result.
SIERA’s regression treats each of these terms individually, replacing four parameters to estimate ERA with 10 to begin the analysis, creating the following formula:
SIERA = a + b*GB%^2 + c*GB% + d*BB%^2 + e*BB% + f*SO%^2 + g*SO% + h*GB%*BB% + i*GB%*SO% + j*BB%*SO%
Then, insignificant terms are removed; in this case, the level of significance is derived from the p-value reported in the regression as well as clinical assumptions.
Part 3 will investigate more closely what went into the formula as well as the process of deriving the end result, while the rest of the week will test SIERA against other estimators and highlight specific pitchers for whom this estimator more accurately gauges skill-based contributions.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
I'm looking forward to the meat of statistics on actual pitchers. I would also be interested in a comparison(looking backward to previous years) to what the older metrics showed for pitchers, and what SIERA would have shown.
Do you get any better results using a cubic equation?
I'm sure you're aware of the positive correlation between BB% and HR rates and your methodology will capture that. However, the baseball c.w. includes a historical class of pitcher (Jenkins, Hunter, et al) who featured elevated HR rates in conjunction with low BB rates. They may be worth looking at.
Essentially, good control improves K and BB rates and depresses BABIP and perhaps HR/FB. But pounding the strike zone rather than nibbling (a difference of approach, not skill, perhaps) also improves K and BB rates but may well increase BABIP and HR/FB. Daisuke Matsuzaka looks like he is actually achieving lower BABIP by nibbling (whether this is sustainable is another question entirely) -- again, a correlation of walk rate to hardness of contact that's opposite the expected.
Another guy I can think of who appears to have a true BABIP skill is Jared Weaver, whose gets a ridiculous BABIP on his FB given his swing-and-miss rate. I think that's a function of the deception in his delivery, where the movement on the pitch doesn't match the upper arm angle.
One thing a metric like SIERA will allow us to do is identify the consistent under- and over-performers better than past metrics, and then we can examine them with pitch/fx and the like. Those findings may never be included in the metric but would allow us to determine after a single year's over- or under-performance whether a pitcher might be one of the rare guys who has a true, non-SIERA measurable BABIP or HR/FB skill (or lack of same).
When not using HR / FB (i.e., in the many cases where FB data is unavailable), you should always use HR / Contact, which is not only the most logical but also has a stronger year-to-year correlation that HR with any other denominator I've looked at.
I don't see any way for this new metric to help with that question, but then I haven't really thought hard about it yet. If you've already seen it, please share.
I do think SIERA should do a better job of accounting for these kinds of correlations, especially when they affect run scoring differently as a set of skills rather than as a sum of their parts, but I'm just not sure that BB% and HR% are really correlated in general based on the data I'm looking at (2003-09 pitchers here).
(Without the adjustments, r = .126, p = .000026.)
This is certainly correct, but I want to point out that it isn't the only reason that FIP (and other defense independent pitching statistics) is better than ERA. Because the nature of a ball in play is to be an out about 69% of the time, it is improbable that a pitcher will give up three consecutive hits, it will happen. When a pitcher gives up three consecutive hits, it is likely that he allows a run. ERA is sensitive to such events, and will under-estimate the talent of a pitcher who is victim to this essentially random event. Another issue is that pitchers are not especially more likely to give up a hit with 2 outs and no men on than with no outs and no men on, but a pitcher is much more likely to allow a run in the latter case. The moral of this story is that actual runs allowed will tend to track factors beyond a pitchers control, like whether the hit allowed was with no outs or with two. But defense independent estimators will ignore such facts and consequently, when well constructed, will be better estimators of pitcher skill.
http://baseballprospectus.com/glossary/index.php?mode=viewstat&stat=38
I don't understand your usage of "un-foil." I hadn't heard that term before this series of articles. Some web searching indicates that it's typically used to describe factoring a second-degree polynomial into two first-degree polynomials. I guess it's inspired by "FOIL," the acronym for "First, Outer, Inner, Last," or a means of remembering how to multiple two first-degree polynomials. You aren't factoring, though, but expanding the QERA polynomial product. So I'm confused. :)
regress
vb [rɪˈgrɛs]
1. (intr) to return or revert, as to a former place, condition, or mode of behaviour
2. (Mathematics & Measurements / Statistics) (tr) Statistics to measure the extent to which (a dependent variable) is associated with one or more independent variables
It's not the OED, but I think it indicates that this is pretty standard usage.
Regress is pretty standard as a verb at this point, I think, but it may not be perfectly used all the time within sabermetrics.