Monday, August 18, 2014
Better BBWAA Cy model
I'd like for someone to develop this algorithm. I've discussed it how to do it manually, and I tried to come up with a simple point-system. The simple point-system simply isn't going to cut it for the nuances we need.
This is what the aspiring saberist should try to think about (and you should best fit to data 2006-2013):
1. Figure out how to balance the "L" in a W/L record. Presumably, the L will carry a weight of -1/2 or -1/3 points to 1 point for the W
2. Figure out how to balance ERA and W/L, and at what point is it "too close" to call, and tie-breakers are needed. For example, say that you have one pitcher with a 2.00 ERA and another is at 3.00 ERA. But the second guy has a 20-10 record while the first guy is at 16-10. As we know, we declare the winner if a pitcher leads in both. But, with the huge ERA advantage, is that enough to also declare a winner, even though he was 4 fewer wins?
So, that's what you have to figure out: at what point is it too close to call when one player leads in one category and the second leads in the other?
3. Then decide on all the tiebreakers, notably, IP, K, and CG, and maybe shutouts and team finish.
You may also find that how it works for the top 2 pitchers won't necessarily work for the rest of the pitchers. It might become more of a jumble in the down-ballot votes.
***
Now, you may think: what's the practical purposes of this for the aspiring saberist? Well, it's important to get into trying to create models. It makes you think in different ways, and gets you to approach the issue in a different way. You may even get into a paradigm shift, in which case, you'll have succeeded, regardless of what you come up with.
?