Tangotiger Blog

Friday, February 01, 2019

Park factor v sequencing

By Tangotiger

In this post, Hareeb makes this observation:

As it turns out, properly removing park factor noise (wRC+) is more important than capturing sequencing (Runs Scored).

I never really thought about it, but it seems like an insightful observation. Could we have figured that out without doing a regression? Let's see. I've never done this before, so let's see where it takes us.

As Hareeb reminds us, high runs scored is based on:

high offensive talent (think true talent wOBA + true talent baserunning)
timing of good events
run-friendly parks?

Which is the most impactful? We can try to make a decent estimate. Let's take them one at a time.

Spread in team talent (offense and defense)

One standard deviation in win% is about .072, which means that we can infer that one SD is .060.
And since offense = defense, then we can estimate that one SD of win% attributed to offense is 1/root2 of .060, or .042.
And since 10 runs ~ 1 win, then one SD of true talent run scoring per game is 0.42
So over 162 games, that's 1 SD = 68 runs of true talent (or 1 SD = 82 runs of observation)

Spread in sequencing

Roughly speaking, one SD of random variation of wOBA over 162 games is: 0.5 x root(38PA x 162G) = 39, which we can scale to runs by x0.8 = 31 runs
If we add the random variation of wOBA to the true talent of team, we get one SD = 74 (root of 68^2+31^2)
We are still short 34 runs, which is probably the effect of sequencing. I don't necessarily like this "leftover" approach, but we just need a decent starting point

Spread in parks

One SD in park factors is probably 5%, which means that with ~ 4.5 x 162 = 729 runs, 5% is 36 runs

Sooooo... spread in parks and spread in sequencing and spread in random variation are.... all about very similar, with parks taking the slight lead! At least using this approach.

Hareeb points out that:

wOBA + minimizing park effect = wRC+
wOBA + park + sequencing = Runs Scored

And since wRC+ beat out Runs Scored, that means neutralizing park effects has more impact than ignoring sequencing! A brilliant observation. And given my approach, I would have expected something pretty close to that (though not necessarily to that magnitude).

Fantastic, I learned something new!

(3) Comments • 2019/02/01 • Linear_Weights • Statistical_Theory

<< Back to main