Tuesday, October 28, 2014
How impressive is WAR?
?Poz has a long piece on Bill James. He quotes Bill on his view of WAR:
Well, my math skills are limited and my data-processing skills are essentially nonexistent. The younger guys are way, way beyond me in those areas. I’m fine with that, and I don’t struggle against it, and I hope that I don’t deny them credit for what they can do that I can’t.
But because that is true, I ASSUMED that these were complex, nuanced, sophisticated systems. I never really looked; I just assumed that the details were out of my depth. But sometime in the last year I was doing some research that relied on these WAR systems, so I took a look at them, and … they’re not very impressive. They’re not well thought through; they haven’t made a convincing effort to address many of the inherent difficulties that the undertaking presents. They tend to get so far into the data, throw up their arms and make a wild guess. I don’t know if I’m going to get the time to do better of it, or if it will be left to others, but … we’re not at anything like an end point here. I assumed that these systems were a lot better than they actually are.
There's things I agree with and things I do not.
1. I do agree that WAR is not impressive, or at least not impressive looking. That's the beauty of its design. For example, look at what WAR is for pitchers at its core:
IP/9 x (lgERA + 1 - ERA) / 10
If your pitcher has a 3.00 ERA in a league of 4.00, and he has 225 IP, you get this:
225/9 x (4 + 1 - 3) / 10 = 5 wins
(That divide by 10 is simply the runs to win converter.)
And here's a little secret: this was invented by... Bill James! In his classic(*) article on the MVP race with Clemens and Mattingly, he goes through the machinations, including doing that "+1" bit, which is actually the most important part of this equation. Without the "+1" part, it becomes Wins Above Average, which is how Pete Palmer presented it in The Hidden Game. The +1 part turns it into Wins Above Replacement.
(*) Most of his articles are classic, so, I'm not really narrowing down the list.
2. As for the not thought through, I do not agree with Bill at all. They are actually incredibly thought through. Again, just as an example, the distinction between Starting Pitchers and Relief Pitchers is huge. This is something that baseball people inherently understand, but that those of us studying the data kind of dismissed or ignored for the longest time.
We just couldn't explain that a 3.50 ERA by a starting pitcher was far better than a 3.50 ERA by a relief pitcher, and it goes beyond just volume of innings. Keith Woolner was one of the first to bring this up over a decade ago, and others followed suit, me included, notably in The Book. This is research that evolved over time to the point where I gave it a rule, the Rule of 17, which basically says that a relief pitcher gets 17% more K, allows 17% fewer HR, allows 17 points fewer in BABIP, and 17% fewer runs (walks are flat).
There's the standard thing we do with park effects, as well as the difference in AL/NL talent ,so that "lgERA" is really adjusted for all that. Some even go so far as to look at the actual opponents and their fielders to further adjust that lgERA. (Note, when I say ERA I really mean RA/9, but ERA is so ubiquitous a term. Which is also another advance, that we focus on runs allowed, not the made-up earned runs.)
3. The wild guess could be something that's true, but I wouldn't say it's a wild guess so much as it's a necessary guess, an educated guess, a guess to move the discussion forward. Some examples are BABIP, which we really don't know how to split up very well, or at least, in a way that we can explain it well enough. If I say to regress Kershaw's 2014 BABIP will be based on his BABIP in 2013 and 2012 and 2011, that looks really confusing. Even if I try to tell you that simply to understand his 2014 performance on its own. It's really really hard. So, I just say: split the difference and assume his responsibility of the BABIP is halfway between his observed performance and league average.
Another one we have to handle is relief pitchers and leverage. Again, to move the discussion forward, we credit the reliever not with the Leverage Index that he actually faced, but rather halfway between that and the (by definition) league average of 1. It's part of a concept called chaining, that if that reliever wasn't there, some other reliever would have taken his place. But much like Ozzie Smith's fielding is leveraged at SS (he's involved in more plays than in LF) or Rickey Henderson's hitting is leveraged at leadoff (and so he gets FAR more PA than the average hitter), we can't completely discount the talent associated with the leveragable opportunities.
***
So let me just say that for purposes of making sure the metric is not a black box, to make sure the metric is accessible, to make sure that anyone could calculate their own version of WAR, the framework is flexible enough to allow that to happen.
We do not want it complex, or (too) nuanced, or too sophisticated. We want it so that anyone can build a house, and WAR gives you that blueprint. The potential saberist out there is now empowered and is given a path to build an even better house. The foundation is there.
We can say that a house is not impressive, or we can say that a house is incredibly impressive. Either way, WAR has been able to cut through the idea that we need something complex to be able to explain something as complex as baseball.