[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
THE BOOK cover
The Unwritten Book
is Finally Written!

Read Excerpts & Reviews
E-Book available
as Amazon Kindle or
at iTunes for $9.99.

Hardcopy available at Amazon
SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
Shop Amazon & Support This Blog
RECENT FORUM TOPICS
Jul 12 15:22 Marcels
Apr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref

Advanced

Tangotiger Blog

<< Back to main

Saturday, March 02, 2019

Bayesian Run Support

?In his series of using Game Score, Bill James highlighted the case of Mike Mussina and Roger Clemens in 2001, both pitching for the Yankees. Across the board, Mussina had the better season, but Clemens won the Cy Young, in large part due to his 20-3 record, compared to Mussina's 17-11. Since they were on the same team, a priori we would presume similar context. And therefore one would need to conclude that Clemens being 5.5 "games ahead" over Mussina in the W-L record is a reflection of Clemens himself.

But, we understand better than that. We understand that the TEAM SEASONAL run support does not itself imply that this is what we'd observe over 33 or 34 starts. What we ultimately care about is not the PRIOR but the POSTERIOR.

What's the difference? a priori is knowledge-before-observation. a posteriori is knowledge-after-observation. So, "knowing nothing about nothing", our Prior in determining the run support for Clemens and Mussina is about 5 runs per start, since the Yankees scored 5 runs per game. Now the observation. Let's assume we don't know how many runs they scored exactly for Mussina and Clemens, and all we had was their W-L and ERA (or RA/9). A 20-3 record in 33 starts and 220 IP with a 3.84 RA/9 for Clemens would likely imply 6.8 runs support per 9 IP. That's our observation.

Our Posterior requires that we consider both our prior (the 5 runs of the Yankees for the season) and our observation (or the implication of the observation) of 6.8. Therefore, our posterior will be some combination of the two. Without trying to come up with the proper balance, let's just split the two and have our Posterior as 5.9. In other words, knowing the Yankees run support, and observing what the Yankees did specifically with Clemens, we estimate his run support at 5.9 runs per start.

Doing the same with Mussina, our implied observation is 4.1 runs per start, so our Posterior is 4.5 runs per start.

And what was it in fact? Clemens was 5.7, so our Posterior was pretty close. And Mussina was 4.2, also decent.

So, absent knowing their actual run support, what is our best approach to estimating their run support:

  • using only their prior (all Yanks starters get 5 runs per start)
  • using only the observation implied by their W-L and RA9 (Clemens gets 6.8 and Mussina gets 4.1)
  • using their posterior (meaning prior and observation)

Clearly, it's the posterior. And in part 2, we'll look at the Bayesian BABIP. As soon as I write it.


(2) Comments • 2019/03/06 • Statistical_Theory

<< Back to main