Introducing SIERA: Part 2

February 9, 2010

Part 1 of this series marked the introduction of Skill-Interactive Earned Run Average, or SIERA, an ERA estimator that more accurately gauges the run-prevention skills of a pitcher relative to his controllable skills. Part 1 focused on the introductory aspects, similarly to going over a syllabus on the first day of class, but today we'll recap the steps that led to SIERA’s creation. One of the major reasons for SIERA’s existence is that prior estimators broke plenty of ground. In this respect, SIERA represents another evolutionary step in the process of removing the effects of defense on pitcher statistics that came into play when Henry Chadwick conjured up the earned run average metric over a century ago.

Chadwick’s metric proved popular at the time and remains one of the most frequently cited tools for determining the quality of pitchers. Back at the turn of the 21st century, however, Voros McCracken shocked the nation with seminal research on the roles of defense and luck in ERA, finding that hurlers exhibited little persistence in their BABIP (batting average on balls in play), and concluding that more went into Chadwick’s toy than what the pitcher could control. This led to the invention of FIP, or Fielding Independent Pitching, which estimates ERA from the three statistics McCracken found to be persistent—walks, strikeouts and home runs. FIP essentially marked the beginning of approximating ERA through defensive independence, and can be calculated as: FIP = 3.20 + (3*BB – 2*K + 13*HR)/IP, where the 3.20 is a constant contingent upon the league and year, used to place the estimator on the ERA scale.

It is very true that FIP will provide a better estimate of a pitcher’s skill level than his ERA, because the latter is open to bloop hits or nabbed line drives. Bloops and other unfortunate events can cause BABIP to fluctuate while those hits or lack thereof can aggregate to create a rift between measured success and actual talent. The problem here deals with the lack of persistence in BABIP as well as in the rates of home runs per fly ball, as intra-class correlations over the span of 2003-09 show that HR/FB, no matter how one chooses to calculate it (out of outfield flies or total flies), does not produce an r greater than 0.15—and home runs per outfield fly ball net of team home runs per outfield fly ball (to control for park effects) only leaves an ICC of 0.084. FIP attempts to correct for BABIP luck but fails to correct for the luck inherent in HR/FB, perpetually over- or underrating certain types of pitchers in the process.

The natural way to correct for some of this home run luck is to adjust FIP through the use of expected, not actual, dingers. The expected tally is calculated by multiplying the league average rate of home runs per outfield flies, as opposed to also lumping in popups, by the total number of outfield flies. These corrections comprise xFIP, created by The Hardball Times and currently housed at Fangraphs. If the league average HR/FB is 18 percent and a pitcher allows 85 outfield fly balls, his expected home runs tally would equal 15.3. If he actually allowed 23 home runs, then his xFIP would be lower than his unadjusted FIP, as the poor luck with home runs would be expected to even out in the next year.

Nate Silver introduced QERA to Baseball Prospectus in 2006 using a similar approach, while acknowledging that run scoring is non-linear because more base runners leads to more runs allowed. QERA used a quadratic form that incorporated walk, strikeout, and ground ball rates, keeping constant the usage of walks and strikeouts but more accurately modeling home runs surrendered through the ground ball rate. Silver also made another improvement by looking at walk and strikeout rates per plate appearance, instead of per nine innings. The reason is quite intuitive as a lower BABIP will lead to higher innings pitched totals and lower K/9, BB/9, and HR/9 rates even though it is not something that DIPS credits as in the pitcher’s control.

Unfortunately, the adjustment methodology was still flawed, as QERA used the percentage of ground balls per ball in play, instead of per plate appearance. This has been criticized due to the idea of using common denominators in a formula. Walks and strikeouts were per plate appearance, so why weren’t grounders treated the same way? The criticism is certainly valid, since pitchers who allow fewer balls in play will gain less by having a higher percentage of grounders, while those who allow more will see a commensurate gain.

Consider a pitcher who strikes out or walks half of the hitters he faces. Why should his ground ball rate per ball in play be as significant as another pitcher who neither strikes out nor walks anybody at all? SIERA corrects this issue by using a variable suggested at the Inside the Book blog: (GB-(FB+PU))/PA. This variable corrects for the common denominator problem and simultaneously treats line drives neutrally. The latter fix is critical given the lack of persistence of liners – the individual rate, isolated from team, produces a .007 ICC—and looks at the extent to which grounders exceed or fall short of the sum of outfield flies and popups.

Reverting to QERA for a minute, another advantage it has over competitors is that it implicitly considers non-linear returns to each term (K%, BB%, GB%), and interactions between those terms. A pitcher’s walk rate impacts his QERA at an increasing rate as they begin to advance batters who have already walked, and pitchers who walk a great deal of hitters may benefit more from grounders than their strike-zone stingy compadres.

The formula for QERA is:

(a + b*GB% + c*BB% + d*SO%)^2

… where a = 2.69, b = -0.66, c = 3.88, and d = -3.4. Un-foiled, this means that the following is also true:

QERA = a^2 + b^2*GB%^2 + a*b*GB%
           + c^2*BB%^2 + a*c*BB%
           + d^2*SO%^2 + a*d*SO%
           + b*c*GB%*BB% + b*d*GB%*SO% + c*d*BB%*SO%

QERA considers that the effect of BB% on ERA may be non-linear, and that if, for example, c^2 is large, walks may increase ERA at an increasing rate; jumping from 4-8 percent may not hurt ERA as much as a jump from 8-12 percent. QERA also allows for ground balls to be more beneficial for pitchers who walk a greater percentage of hitters, as the term b*c*GB%*BB% is negative; increasing your ground ball percent from 40 to 45 may do more for a pitcher who has a high walk rate than for one who walks fewer hitters.

Unfortunately, this functional form is very limiting. The three components are not all quadratic, as while the rate of whiffs and grounders is, the rate of walks is not. Since a squared term has to be positive, then b^2, c^2, and d^2 are all positive, but our results show that the coefficient in place of b^2 should be negative since more ground balls can drive down ERA at an increasing rate.

Another limit of this functional form is that the interaction terms (e.g. b*c for the product of ground ball and walk rates) are limited by what numbers for a, b, c, and d are the most realistic for the earlier terms in the equation. The term in place of c*d should probably be zero, as pitchers who walk a great deal of hitters do not necessarily benefit any more from strikeouts than pitchers who walk next to nobody. If c or d were zero, QERA would predict that strikeout or walk rates have nothing to do with run prevention, a clearly false result.

SIERA’s regression treats each of these terms individually, replacing four parameters to estimate ERA with 10 to begin the analysis, creating the following formula:

SIERA = a  + b*GB%^2 + c*GB%
           + d*BB%^2 + e*BB%
           + f*SO%^2 + g*SO%
           + h*GB%*BB% + i*GB%*SO% + j*BB%*SO%

Then, insignificant terms are removed; in this case, the level of significance is derived from the p-value reported in the regression as well as clinical assumptions.

Part 3 will investigate more closely what went into the formula as well as the process of deriving the end result, while the rest of the week will test SIERA against other estimators and highlight specific pitchers for whom this estimator more accurately gauges skill-based contributions.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Eric Seidman

Matt Swartz

Latest Articles

You need to be logged in to comment. Login or Subscribe

dsher84

2/09

Fascinating stuff, gents. Any chance I can cajole you, or other readers, into giving me a couple values from 2009 season pitchers so I can confirm my spreadsheet is right? Or, could anyone confirm these random values? Aardsma 3.21? Accardo 5.21? Mike Adams 2.12? Proofreading this formula is making my head hurt. I rarely wade too deeply into these more challenging articles, but this one has me glued to my monitor.

Reply to dsher84

swartzm

2/09

Thanks. I think you are typing the formula in wrong though. I'm getting Aardsma 3.41, Accardo, 5.20, and Adams 4.17.

Reply to swartzm

EJSeidman

2/09

He may not be making the squared GB term negative if (GB-(FB+PU))/PA is negative.

Reply to EJSeidman

robustyoungsoul

2/09

Really, really exciting stuff guys. Looking forward to part 3.

Reply to robustyoungsoul

TGisriel

2/09

I'm glad you disclose the numbers and process. I'm glad you explain what you're doing and why it should be an improvement on earlier metrics. I am reminded, however, of the title of the edited collection of Bill James articles which went something along the lines of this time let's throw out the bones.

I'm looking forward to the meat of statistics on actual pitchers. I would also be interested in a comparison(looking backward to previous years) to what the older metrics showed for pitchers, and what SIERA would have shown.

Reply to TGisriel

prospero14

2/09

Huh. Once you multiply out the equation for QERA, it seems completely natural to ask: which quadratic function of GB%, BB%, and SO% best predicts future performance?

Do you get any better results using a cubic equation?

Reply to prospero14

swartzm

2/09

This will be in tomorrow's article in more detail. The cubic thing is limited only because there just isn't enough data to finely tune it quite that much. Nothing comes out significant when I try that. Thanks for the question.

Reply to swartzm

hotstatrat

2/09

Not to take anything away from Voros McCracken who's work with BABIP and various stats was no doubt very important, but Bill James was the first person I read discussing the importance of team defense (let alone park effects, etc.) on ERA. That was in the 80s. As I recall he coined DER and discussed how non-strikeout pitchers rely more on having good defenses behind them.

Reply to hotstatrat

nosybrian

2/10

True about James, but he was genuinely impressed by McCracken's finding, whicn influenced James' own subsequent work. See the citations in the Wikipedia article about Voros McCracken.

Reply to nosybrian

ericmvan

2/09

Here's food for thought for version 2.0.

I'm sure you're aware of the positive correlation between BB% and HR rates and your methodology will capture that. However, the baseball c.w. includes a historical class of pitcher (Jenkins, Hunter, et al) who featured elevated HR rates in conjunction with low BB rates. They may be worth looking at.

Essentially, good control improves K and BB rates and depresses BABIP and perhaps HR/FB. But pounding the strike zone rather than nibbling (a difference of approach, not skill, perhaps) also improves K and BB rates but may well increase BABIP and HR/FB. Daisuke Matsuzaka looks like he is actually achieving lower BABIP by nibbling (whether this is sustainable is another question entirely) -- again, a correlation of walk rate to hardness of contact that's opposite the expected.

Another guy I can think of who appears to have a true BABIP skill is Jared Weaver, whose gets a ridiculous BABIP on his FB given his swing-and-miss rate. I think that's a function of the deception in his delivery, where the movement on the pitch doesn't match the upper arm angle.

One thing a metric like SIERA will allow us to do is identify the consistent under- and over-performers better than past metrics, and then we can examine them with pitch/fx and the like. Those findings may never be included in the metric but would allow us to determine after a single year's over- or under-performance whether a pitcher might be one of the rare guys who has a true, non-SIERA measurable BABIP or HR/FB skill (or lack of same).

Reply to ericmvan

philosofool

2/10

Are you sure that this positive correlation between BB% and "HR rate" isn't an artifact of some sort? I'm worried that once you control for other factors like GB%, it goes away. What do you mean HR rate? HR/FB? HR/9? (Sinker ballers often don't have the same control skills as other pitchers; this may be because it's harder to control sinking pitches or perhaps its just that a high GB% means you can have greater success with weaker control, so they just get away with it. I don't know. Such a fact might explain the correlation between BB% and HR rate.)

Reply to philosofool

ericmvan

2/11

Not an artifact. Same pitchers, look at year-to-year changes in BB rate, and HR / Contact follows very mildly but with immense statistical significance. See below for the details.

When not using HR / FB (i.e., in the many cases where FB data is unavailable), you should always use HR / Contact, which is not only the most logical but also has a stronger year-to-year correlation that HR with any other denominator I've looked at.

Reply to ericmvan

DrDave

2/10

Eric, you raise a very important distinction when you mention skill versus approach. We really don't have any idea at all what the effect on ERA would be for a particular individual pitcher to nibble more (or less), given his skill set.

I don't see any way for this new metric to help with that question, but then I haven't really thought hard about it yet. If you've already seen it, please share.

Reply to DrDave

nosybrian

2/10

Your last para is especially valuable, because it reminds us that these metrics can't cover every single contingency, and that while there is value in adding complexity to the measure there's also value in keeping it from being "overfitted" to every single contingency. In the end, the "residuals" such as those you mention will be instructive, but shouldn't necessarily lead to making the indicator itself even more complex.

Reply to nosybrian

swartzm

2/10

Eric M. Van-- I'm not sure that there is a correlation between BB% and HR%. I'm finding only -0.03 in my data set. There seems to be a correlated between doubles and walks, and between doubles and home runs, but not between walks and home runs.

I do think SIERA should do a better job of accounting for these kinds of correlations, especially when they affect run scoring differently as a set of skills rather than as a sum of their parts, but I'm just not sure that BB% and HR% are really correlated in general based on the data I'm looking at (2003-09 pitchers here).

Reply to swartzm

ericmvan

2/11

Change in HR/Contact, adjusted for age and for any change in role, correlates to change in (BB-IBB)/(PA-IBB), r = .130, p = .000015 (n = the 1107 pitchers who faced 200+ BFP in consecutive seasons for the same team playing in the same park, 2002-2009).

(Without the adjustments, r = .126, p = .000026.)

Reply to ericmvan

swartzm

2/11

Maybe GB/FB changes correlate with (BB-IBB)/(PA-IBB) changes? I could certainly see two skills' deterioration being correlated. I guess many pairs of skills generally would as people aged.

Reply to swartzm

philosofool

2/10

"It is very true that FIP will provide a better estimate of a pitcherâ€™s skill level than his ERA, because the latter is open to bloop hits or nabbed line drives."

This is certainly correct, but I want to point out that it isn't the only reason that FIP (and other defense independent pitching statistics) is better than ERA. Because the nature of a ball in play is to be an out about 69% of the time, it is improbable that a pitcher will give up three consecutive hits, it will happen. When a pitcher gives up three consecutive hits, it is likely that he allows a run. ERA is sensitive to such events, and will under-estimate the talent of a pitcher who is victim to this essentially random event. Another issue is that pitchers are not especially more likely to give up a hit with 2 outs and no men on than with no outs and no men on, but a pitcher is much more likely to allow a run in the latter case. The moral of this story is that actual runs allowed will tend to track factors beyond a pitchers control, like whether the hit allowed was with no outs or with two. But defense independent estimators will ignore such facts and consequently, when well constructed, will be better estimators of pitcher skill.

Reply to philosofool

hotstatrat

2/10

I am noticing that PECOTA's projected ERA, including PERA, and EqERA have much higher estimates than CHONE's and Ken Warren's projected ERAs on pitchers with very high GB/FB. Is that something that needs to be adjusted/updated? Will PECOTA start gearing their ERA projections more towards SIERA based projections next year?

Reply to hotstatrat

hotstatrat

2/10

(I don't mean to insinuate that PECOTA is less right than CHONE, etc. - just wondering if there is a GB/FB component to its player matching and if it has been checked as to whether that would improve it or not - and if it would how strongly that characteristic should be used. I realize GBs haven't even been measured for very long, so I guess there would have to be some translation and migration from HR rate data to GB/FB data.)

Reply to hotstatrat

swartzm

2/10

I think PECOTA does use information on groundout/flyout ratio, but I'm not totally sure. I think SIERA will be more involved next year in the PECOTA process but I'm not sure really where it fits in. I do think that batted ball statistics that have only been collected properly since 2003 would help all projection systems immensely, but that it might be tough to incorporate some of that accumulated knowledge into a model like PECOTA without throwing out 50 years of other data is uses effectively too.

Reply to swartzm

nosybrian

2/10

Yes, PECOTA does have a FB/GB adjustment for pitchers.

http://baseballprospectus.com/glossary/index.php?mode=viewstat&stat=38

Reply to nosybrian

myshkin

2/10

I like the ideas so far, but I have something of a nit-pick, but I still regret not making any fuss when MGL and others started using "regress" as a transitive verb...

I don't understand your usage of "un-foil." I hadn't heard that term before this series of articles. Some web searching indicates that it's typically used to describe factoring a second-degree polynomial into two first-degree polynomials. I guess it's inspired by "FOIL," the acronym for "First, Outer, Inner, Last," or a means of remembering how to multiple two first-degree polynomials. You aren't factoring, though, but expanding the QERA polynomial product. So I'm confused. :)

Reply to myshkin

goldenyeti

2/10

From http://www.thefreedictionary.com/regress

regress
vb [rÉªËˆgrÉ›s]
1. (intr) to return or revert, as to a former place, condition, or mode of behaviour
2. (Mathematics & Measurements / Statistics) (tr) Statistics to measure the extent to which (a dependent variable) is associated with one or more independent variables

It's not the OED, but I think it indicates that this is pretty standard usage.

Reply to goldenyeti

swartzm

2/10

Yeah, maybe we shouldn't have used the word unfoil. It was supposed to be somewhat of a play on first-inner-outer-last, but really we should have stuck with unravel. I think it'll be clearer in today (Wednesday's) article, though we probably used the word again IIRC.

Regress is pretty standard as a verb at this point, I think, but it may not be perfectly used all the time within sabermetrics.

Reply to swartzm

hotstatrat

2/10

I am getting a little disenchanted with GB/FB data. It seems these numbers vary even more wildly from season to season by the same pitcher than HR/9. What's going on?

Reply to hotstatrat

swartzm

2/10

Hmm...there must be something wrong with your data source. GB/FB, GB/Batted Ball, FB/Batted Ball-- these all have correlations of something like .70-.80 year-to-year. I'm pretty sure they are more persistent than even strikeout rates for pitchers. The thing is that you cannot use data before 2003. It's possible that you are looking at Groundout/Flyout data, but even that should be reliable. HR/9 on the other hand, has a year-to-year correlation of something like .2 and that breakdowns when you net out team effects and do HR/outfield flyball.

Reply to swartzm

hotstatrat

2/10

Thanks, Matt. I am looking at the GB/FB column data on FanGraphs. Perhaps, I am being fooled by seasons with a tiny sample size as each level (minors, majors) has a separate line, though, I don't think so. I am making my own projections from career data counting the most recent seasons the heaviest, so I am checking each line's inning total to asess the significance of its corresponding GB/FB data. Hence, I can see how some pitcher's GB/FB jumps from 1.50 to .75 and back. In GB% that would be 60% and 43%, if GB% = GB / (GB+FB), so I see that the percentage change in GB/FB is far greater than it is in GB%, despite that they are both measures of ground ball tendency. If you say that both year-to-year correlations are in the same .7 - .8 range, I guess that difference gets ironed out in the calculation. However, from all my work, those jumps in FG's GB/FB look a heck of a lot heftier than the year to year changes in K/9 or K/BB, but I am not looking at those other stats as closely, because projections are already provided.

Reply to hotstatrat

swartzm

2/10

FB on fangraphs includes pop-ups and outfield fly balls but NOT line drives, so that seems to be muddying things a little. I get .75 for GB% year-to-year correlation for 2005-08 data on pitchers with at least 30 IP both years. GB/FB gives me .78.

Reply to swartzm

Introducing SIERA: Part 2

Thank you for reading

Latest Articles

Fantasy Four: Q&A with Chris Torres $

Five & Dive, Episode 426: Let’s Talk Turkey

The Almost-Comprehensive Fall League Prospect Rankings $

TA: The Dodgers Are Doing Five Blades $

So You’ve Decided to Trust the Robots B

Eric Seidman

Matt Swartz

Latest Articles

Fantasy Four: Q&A with Chris Torres $

Five & Dive, Episode 426: Let’s Talk Turkey

The Almost-Comprehensive Fall League Prospect Rankings $