Introducing SIERA: Part 5

February 12, 2010

A helpful comment in one of the previous articles alerted us that our park-adjustment method was not quite correct, so we have updated the formula accordingly for this article and in the glossary. The formula only changes slightly and the tests from the previous article are very close to the same as well. However, for the sake of transparency, we are highlighting this at the beginning of the article. The new formula:

SIERA = 6.145 – 16.986*(SO/PA) + 11.434*(BB/PA) – 1.858((GB-FB-PU)/PA) + 7.653*((SO/PA)^2) +/- 6.664*(((GB-FB-PU)/PA)^2) + 10.130*(SO/PA)*((GB-FB-PU)/PA) – 5.195*(BB/PA)*((GB-FB-PU)/PA)
where +/- is as before such that it is a negative sign when (GB-FB-PU)/PA is positive and vice versa.

Over the last several days, we have covered plenty of introductory ground with regards to the ERA estimator SIERA, from explaining its evolution to demonstrating the creation of its formula. In Part 4, we tested the metric in several different areas against other estimators and found that the goals set forth at the outset of this process were attained: SIERA proved better than any other estimator that appropriately treats HR/FB rates as luck at predicting park-adjusted ERA in the same season, and it proved better in its predictive nature of park-adjusted ERA in the subsequent season than any estimator, regardless of its HR/FB treatment.

The final piece of this puzzle involves a bit of statistical exploratory surgery in order to explain why it fared better than its competition. While some may suggest that SIERA represents a mere five-percent improvement on QERA, the reality of the matter is that the metric vastly improves our understanding of certain types of pitchers consistently undervalued or overvalued by the current crop of estimators. This concept may feel largely abstract, so our time will be spent on exemplifying what we mean through archetypal personifications.

Recall that this method of estimation corrects a few problems inherent elsewhere, such as inconsistent denominators or the inclusion of insignificant regression terms. SIERA uses (GB-FB-PU)/PA to keep ground-ball rate normalized by PA like walk and strikeout rates are, so pitchers who allow an abnormal percentage of balls to be put into play can be better estimated. Similarly, we can also better estimate the ERAs of pitchers capable of inducing a surplus of worm beaters, who are therefore able to erase the inevitable ground-ball singles with double plays. SIERA also better estimates ERA for pitchers who are vulnerable to home runs but above average in the preventing base runners department, limiting dinger damage.

In the coming paragraphs, you will be visited by three pitchers, representative of the archetypes that other estimators consistently miss, but are now captured correctly viewed through SIERA.

Joel Pineiro

Pineiro had a 3.56 park-adjusted ERA last season to go along with a 3.1 percent BB-rate (1.14 BB/9), 12.1 percent K-rate (4.42 K/9), and somewhere around a 61.3 GB% depending on the source. Consider the various ERA estimators and their attempts at approximating his runs allowed per nine frames:

ERA = 3.56
SIERA = 3.60
FIP = 3.27
xFIP = 3.76
QERA = 3.96
tRA = 3.42

What should stand out initially is that Pineiro is precisely the kind of pitcher that exacerbates the denominator problem discussed earlier. He allows 84 percent of hitters to put the ball into fair territory, compared to the league average of 73 percent. Therefore, having a high ground-ball rate is particularly important and his well, well above-average rate is very helpful. However, QERA only considers the average effect of ground balls on ERA and underestimates the importance for someone in Pineiro’s situation.

For instance, if Pineiro maintained his 3.1% BB-rate and 12.1% K-rate, but had a 40%/20%/40% breakdown of GB/LD/(FB+PU), then his SIERA would be 4.56 and his QERA would be 4.55. Instead, going from 40 percent to 61 percent ground balls gives Pineiro an extra boost, as that 21-percent difference is part of a much larger set of balls in play.

Using xFIP leads to another problem in that Pineiro walks so few batters that he has fewer runners on base in the rare occurrences that he does give up home runs. FIP and xFIP multiply home runs and expected home runs, respectively, by 13 and divide by innings pitched. That treats the effect of blasts as constant for all pitchers, but the damage for someone like Pineiro is largely counteracted by his lack of ducks on the pond. Pineiro does allow plenty of ground balls, though, which can lead to singles, but he allows enough of them to the point that he frequently doubles runners off, as he induced 29 double plays last year.

This is the kind of adjustment that SIERA makes, as it allows for a quadratic term on ground balls. Both QERA and xFIP are clearly high for the reasons above, but FIP is nearly as close as SIERA with Pineiro and is on the low side, for the reason that Pineiro only allowed 6.5 percent home runs per fly ball, below the league average. This explains why tRA does a little better than SIERA for same-year ERA predictions, as it attributes this performance aspect, which is largely luck-laden due to its inconsistency, to skill in addition to crediting him for his low 15.8 percent line-drive rate, which we know is also largely luck-driven. Along the same line of reasoning, this is why SIERA bests all others at predicting park-adjusted ERA in the following year, as it expects such lucky marks to regress.

Johan Santana

Santana is another example of a pitcher who is much more accurately estimated through SIERA. From 2004-09, his average ERA (not weighted, just a quick average of his six ERAs) was very low, but estimators perpetually overestimated what his ERA should have been. The estimators produced the following in that same span:

ERA = 2.93
SIERA = 3.03
FIP = 3.55
xFIP = 3.61
QERA = 3.19
tRA = 3.15

The real issue is that FIP and xFIP are too bearish on home runs, neglecting to realize that his prowess when it comes to whiffing and walking mitigate the results even if he lacks the ground-balling tendencies as other star pitchers; as with Pineiro, fewer baserunners lead to fewer multi-run blasts. QERA is not specific enough about the interactions to properly nail down the ERA estimation, and by correcting for the shortcomings discussed in this paragraph, Santana exemplifies the impact on estimation SIERA brings to the table.

To see this, let’s look in more detail at Santana’s actual home runs from 2004-09. He gave up 146 blasts, but 72 percent of them were solo shots, while the league average was 57 percent. Santana allowed 1.33 runs on the average home run, 17 percent lower than the league’s 1.59 runs on the average homer! Had the coefficient on home runs in FIP been 17 percent lower, his FIP would have drop by 0.24, making up nearly half of the difference.

Brandon Webb

Webb is known for his extreme ground-balling ways, and since SIERA’s ground-ball percent terms have a negative quadratic on ERA, the more grounders induced, the greater the impact on run prevention. This is unlike others, which treat grounders as constant or offer diminishing returns as the rates get higher. The rationale is that ground balls lead to a lot of singles, but those singles can be erased by double plays. Consider the following chart, showing Webb’s 2004-08 numbers:

ERA = 3.01
SIERA = 3.18
FIP = 3.45
xFIP = 3.39
QERA = 3.63
tRA = 3.61

Webb’s SIERA is far closer to his ERA during 2004-2008. Those double plays were a major reason why this happened. Webb got 0.96 DP/9 IP, while the league averaged 0.81. This is more impressive, considering his WHIP was 1.24 to the league’s 1.40. His ratio of double plays-to-base-runners was .0860, well above the league’s .0646. Webb was somewhat vulnerable to singles due to his ground-balling ways. He allowed singles on 78.2 percent of hits in play compared to the league average of 74.7 percent, but he was less vulnerable to extra-base hits and was able to erase those runners back off the basepaths with double plays.

This last point is important, because SIERA adjusts for a common criticism of DIPS metrics, since it is based on a regression output. Many times you'll hear about DIPS for the first time and wonder if certain pitchers are more prone to doubles and triples on balls in play, even if they do not allow any more hits. As the pitchers who fit this criteria would presumably not be ground-ball pitchers (since ground balls have lower ISO on balls in play than fly balls do), a regression analysis will credit ground-ball pitchers for this double- and triple-prevention skill by giving them lower ERAs if fellow ground-ball pitchers have had lower ERAs for this reason, too.

Conclusion

Pineiro, Santana, and Webb are obvious examples of where the benefits are for using SIERA to estimate ERA. Each of them saw other estimators inaccurately estimate their ERAs. Pitchers like these demonstrate the reason why SIERA predicted next-year ERA better than any of the other available estimators. Developing better baseball statistics is not an academic exercise. It is a way to better value what happens on the field, allowing analysts to better understand and predict performance and front offices to build better teams.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Eric Seidman

Matt Swartz

Latest Articles

You need to be logged in to comment. Login or Subscribe

NathanJM

2/12

So... the obvious question after seeing a list of "better-than-the-rest-at"... are there particular pitcher types that SIERA has trouble with?

Reply to NathanJM

seanpotter

2/12

I'd like to calculate some pitcher SIERAs myself using FanGraphs data. They include pop-ups in their fly balls number. Should I negate that to avoid redundancy?

Reply to seanpotter

sunpar

2/12

I'm slightly embarrassed to have waited until part 5 to ask this, but what does "PU" stand for in the ground-ball calculation?

I had thought it was pop-ups.

Reply to sunpar

cjones06

2/12

It is pop-ups.

Reply to cjones06

MHaywood1025

2/12

What is the probability a ground ball goes for a hit, given there is a man on first? It makes sense that it is different than the probability it goes for a hit with nobody on, but is it really significant enough to consider it a good thing for run prevention?

You say that ground balls lead to a lot of singles, so why does this change so significantly with a man on first? Wouldn't it be close to the same percentage of ground ball singles with nobody on, and thus more often advance the runners than create a double play?

Reply to MHaywood1025

sunpar

2/12

The probability of a single doesn't change significantly with a man on. Obviously if you have a guy on first who attempts a ton of steals, you may increase ground ball BABIP a bit since the SS or 2B may be forced to cover, but this difference is probably not significant, and will likely be caught by the regression anyway.

All the authors are saying is that a ground ball with men on is preferable to a fly ball, because it will score fewer runners (doubles and triples and HRs will score more runners than a ground ball) while offering a greater chance of erasing runner via the double play.

The interesting question then, is whether a ground ball is ever preferable to a strikeout in this regression. It shouldn't be, as a strikeout will always be more run-preventing than a ground ball regardless of men on base (because, as you say, it can always be a single and advance/score runners), but it would be an interesting test of the regression to see if it breaks down at some point.

Reply to sunpar

swartzm

2/12

My browser isn't letting me reply to individual comments, so I will reply in bulk to the first three I see--

@NathanJM: I think that like most DIPS metrics, it will do particularly badly for pitchers who should not be in the majors or should be on the DL. These pitchers probably would be likely to have higher line-drive rates and higher HR/FB. After all, if I was in the majors, I would have higher LD% and HR/FB. I think FIP will do better at these high extremes because it will punish the injured or unqualified pitcher for the HR/FB instead of treating it as luck like SIERA would.

@seanpotter: PU is for popups, so in the FanGraphs data, you can just use their FB because that is outfield flies plus infield flies. Just replace our (FB+PU) using BP's stats with FanGraphs' FB stat.

@MHaywood1025: looking at 2009, hitters had a .297 BABIP with none on and .304 with men on first, which means that there were about 274 more hits with men on first than bases empty. There were 3,494 double plays with men on first. So while the effect you mention is true, it's a much smaller effect than the double play effect.

Reply to swartzm

sunpar

2/12

Heh, I was just thinking it might be fun to measure the the HR/FB ratio in the HR Derby as an upperbound for how much of it could possibly be due to pitcher skill. :)

Reply to sunpar

lopkhan00

2/12

Frankly, what this shows me is that (assuming that SIERA is the new gold standard, and I'm willing to accept that) ERA, adjusted for park influence, has always been a damn good stat and was closer to reality than all the previous attempts to improve upon it.

Reply to lopkhan00

sunpar

2/12

That's not quite what is being shown here. The ultimate goal is to figure out how much a pitcher's performance is his alone and how much of it is defense/luck. ERA will always measure both, but we want to separate the two. Over the long run, defensive and luck-based contributions should be zero and thus pitchers' ERA should be close to their fielder and luck independent ERA.

He just picked out 3 players whose ERA were consistently being mis-estimated by other metrics and who (un-coincidentally) had "extreme" skills-- Johan with the high K-rate and high fly ball rate, and Webb/Pinero with the high GB rates.

For these 3 pitchers (and especially for Santana and Webb, who have 4-5 years of data), the other metrics are consistently underestimating their contributions. SIERA, presumably, does a better job of measuring their skill-based contributions than the other metrics.

Reply to sunpar

nosybrian

2/12

The "presumably" in your final sentence could be tested explicitly by looking at the "next year" predictions for these pitchers. While the article compares the average "same year" predictions for the different metrics, I'm not it shows us that Siera is significantly better in predicting the "next year" for these pitchers, given the previous year or "base years" (typically this involves a weighted average of previous three years).

Reply to nosybrian

flalaw

2/13

Yeah, I'd like to see SIERA applied to 2008 numbers to see how well it predicted 2009 vis-a-vis FIP, QERA, etc.

Reply to flalaw

swartzm

2/13

I put that in the comments of the last article. With the updated formula though, it's basically a tie between FIP and SIERA (1.108 for FIP and 1.107 for SIERA, with QERA at 1.185, and xFIP at 1.226 and TRA at 1.307.

Keep in mind, though, that SIERA was a regression on SAME-YEAR park-adjusted ERA, so it is not biased on those coefficients. Using this, the new coefficients and park factors have the final tally at: SIERA at 1.158, FIP at 1.198, QERA at 1.258, xFIP at 1.331, and tRA at 1.202.

Reply to swartzm

swartzm

2/13

lopkhan00: With all due respect, I have absolutely no idea how you could come to this conclusion from this article. The 4th article showed that ERA doesn't do nearly as well predict future ERA as any of the estimators, and in this article, ERA was the baseline to check the other estimators against. I can't follow your thought process at all, but please expand if I'm missing something.

Reply to swartzm

JinAZReds

2/13

Matt,

What are your thoughts on using SIERA on team statistics? Are some of the interactions it captures valid interpretations at the team level? I think, in general, it would probably hold up well, but I'm curious what you think about this.

Cheers,
Justin

Reply to JinAZReds

swartzm

2/13

Justin, what we've been doing is finding out individual pitchers' SIERAs and weighting them by the IP, so it would be like getting Skill-Interactive Earned Runs for each pitcher, adding them up, and divided them by team innings (and multiplying by 9). You want to get the interactions to be relevant for the relevant pitchers.

Reply to swartzm

mgolovcsenko

2/14

So is SIERA now being employed in the 2010 PECOTA projections? How?

Reply to mgolovcsenko

BurrRutledge

2/15

And/or the PFM?

Reply to BurrRutledge

shanecris

2/14

Can we get a Siera projections for all pitchers? For Fantasy purposes that would be terrific..

Reply to shanecris

kdringg

2/18

I would like to second this request...please.

Reply to kdringg

GoodKingJohn

4/25

from the equation, we have 7.653*((SO/PA)^2). however, as k's go up, this value would go up, and result in a LARGER SIERA value. does this make sense?
thanks

Reply to GoodKingJohn

swartzm

5/10

There is also a negative linear term in there for regular strikeouts that will always dominate. You would need to have over 100% strikeouts to have the positive squared term have a bigger effect than the negative linear term.

Think of it this way: ERAs can't go below 0. The positive squared term slows the run prevention down when there weren't many runners on in the first place.

Reply to swartzm

sdgeiger

8/23

Matt,

In your view, is WAR and ERA+ as valuable as SIERA?

-Scott

Reply to sdgeiger

swartzm

8/30

They are all valuable in different ways, and there are also three versions of WARWARP.

Two main distinctions are important:
1) Measuring outcomes vs. Measuring skills
2) Measuring cumulative performance vs. measure rate of performance

FanGraphs WAR uses FIP, which is like SIERA in that it measures a skill, but unlike SIERA in that it measures how well you did cumulatively instead of as a rate of performance like SIERA (which approximates earned runs per nine innings).

BP WARP and Baseball-Reference WAR measure outcomes as a cumulative performance, while ERA+ measures outcomes as a rate of performance. WARP and WAR measure outcomes relative to a replacement player, while ERA+ measures outcomes relatively to average.

Hope that helps!

Reply to swartzm

GoodKingJohn

3/26

revisting siera part 5.
I am trying to find numbers for player, and get values that are way out of whack.

when you say that the +/- is as before, regarding the GB-FB-PU, is this for ALL of the terms that include GB-FB-PU? (you only have one +/- in the formula, yet GB-FB-PU appears numberous times.

but sadly, even allowing for the +/- for all the terms I am still getting crazy values.
can somebody help?
thanks

Introducing SIERA: Part 5

Thank you for reading

Latest Articles

Fantasy Four: Q&A with Chris Torres $

Five & Dive, Episode 426: Let’s Talk Turkey

The Almost-Comprehensive Fall League Prospect Rankings $

TA: The Dodgers Are Doing Five Blades $

So You’ve Decided to Trust the Robots B

Eric Seidman

Matt Swartz

Latest Articles

Fantasy Four: Q&A with Chris Torres $

Five & Dive, Episode 426: Let’s Talk Turkey

The Almost-Comprehensive Fall League Prospect Rankings $