[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

FanPost

The Bayes of Summer

As of this writing, the Mariners are 13-2 for an .866 winning percentage, which means they are set to win 140 games this year. "Pshaw," you say? (Baseball fans are relics, and use expressions like "Pshaw"). You know that even the sainted Boone/Ichiro! M’s won 116 games (.716 winning percentage), they were pretty lucky, and nobody’s ever done better. Former player of the week Tim Beckham has already seen his wRC+ drop to a measly 200 or so, Jay Bruce isn’t going to hit 75.6 home runs and there’s even a chance Daniel Vogelbach (The Danbino?) won’t be able to maintain his 318 wRC+. Also, Mariners.

So how many wins will the Mariners have? One way to figure it is to take your beginning of the year projection (lets say 75 wins) and note that the M’s have probably banked 6 more wins than expected over the first couple of weeks. If you figure they will play like a 75-win (.463 winning percentage) team the rest of the year, you might update your estimate to 81 wins. But are the M’s really a 75-win true-talent team that’s going to play .463 ball the rest of the way? Beckham and Bruce really do look ready for bounce-back years. Vogelbach may not keep hitting like Babe Ruth on stanozolol, but he looks to be a solid contributor. How good are the M’s really?

One way to come up with an answer is to use Bayesian statistics. I’m not going to cite Bayes theorem, or give a treatise on how to apply it because that would be boring, I’d probably get it wrong, and the Google will lead you to much good content if you are interested. I will say that Bayesian statistics is just a formalized way to do the type of figuring we all do. We have some prior estimate of how good the team is, and we have some new evidence (they went 13 and 2 !!!), so we update our estimates with the new information to get what’s called a posterior estimate (insert Kyle Seager posterior estimate joke here).

If you were a Bayesian from Mars who didn’t know anything about baseball, and figured teams are just randomly chosen from the population, your prior assumption would be that the M’s winning percentage could be anything. That’s called a uniform prior.

Prior’s Uniform

I said a uniform prior, (a Beta(1,1) distribution) not a Prior uniform:

The alien thinks a baseball team’s winning percentage is equally likely to be anything, and it’s got a 95% chance of being between .025 and .975. If the M’s play 15 games and win 13 of them, that winning percentage would be the only additional evidence of team quality so the alien would add that to its prior to get this (that’s a Beta(14, 3) distribution).

That distribution peaks at .866, but half of it is to the left of the blue dotted line. The M’s would be just as likely to be below that winning percentage as above.

Now let's say you know ever so slightly more about baseball than a Martian, but have no clue about the M’s (i.e. you are an East-coast baseball pundit). You would start with the assumption that teams average .500 and generally win between 60 and 100 games. Something roughly like this:

So not knowing anything about a team, if someone told you "Hey my team went 13 and 2 to open the year," you might guess the team was this good:

You would think that a team that opens the season 13-2 is roughly a 94 win team, but has a 95% chance of winning between 75 and 112. (FYI, the chances of winning 117 games would be .005)

But of course we DO know something the M’s, so we we can use a better prior estimate than a Martian or an East-coast sports writer. In fact we have some prior estimates from the site editors’ season preview. If we bucket them in 3 win ranges, they look like this

So we want to start with a rough estimate that peaks around 75 (.462 winning percentage) but has a decent chance of being 80 (.493) or 70 (.432). The chart below shows has about 50% chance of being in that 70-80 range and a 95% chance of being between 59 and 91 games.

Because the Beta distribution is literally made for doing this kind of figuring, we can just add our evidence to our priors. When we do, our estimates shift like so.

So our Bayesian estimate would say that the M’s are most likely an above .500 team. Note that this is not an estimate of how many games they’ll win this year - it’s an estimate of their true-talent winning percentage, which means, given the information we have, they should play about .515 ball going forward, which would leave us at 89 wins total (chance of winning 117 roughly .000). As I write this, fivethirtyeight.com estimates the M’s have an 87 win season, because they know that the competition we’ve faced so far isn’t as good as what’s coming up and you can’t just use the simplistic calculation I used.

So, as you knew before reading this, if you want to know how many games the M’s are going to win, just go to Fangraphs or Fivethirtyeight. This has been your useless statistics lesson of the day. GOMS.