Friday, March 05, 2021
Kinematic Quadratic
Some math homework. Only jump over the line if you enjoy math.
Statistical_Theory
Some math homework. Only jump over the line if you enjoy math.
Fifteen years ago, I tried to guesstimate the quality of plays.
Notably, I presumed 60% of the plays were "easy outs" with an average out rate of 98.3%. And another 20% that were "easy hits" with an average out rate of 5%. The other 20% of the plays were uniformly distributed between 10 and 95%, for an average of 50%.
Well, now with Statcast, we can come up with more precise numbers. 43% were easy outs at 98.4% and another 22% were almost-easy outs at 91.4%. The two combined are 65% of the plays at an our rate of 96.0%. So, not bad in terms of my baseball guts.
There's another 21% that were auto-hits (0% out rate), which compares very favorably to what I presumed.
The other 14% were 0 to 85% out rate for an average of 47.1%. That's reasonably in-line as well.
Why was I able to come up with reasonable estimates 15 years ago with no granular data to speak of? There's two reasons:
Given that, anyone would have been able to come up with a similar distribution. That's how you need to approach analysis, not just in sports. You have to have some level of understanding of the data to expect. Without a prior distribution, you are really not going to be able to have much confidence in what you are doing.
I had a Twitter poll where most of the respondents were incorrect in their guess.
In games played by the Expos over 162 games, 52% of runners on base are from the Expos.
In games played by the Spiders over 162 games, 48% of runners on base are from the Spiders.
When the Expos play the Spiders in a 7 game series, what is the chance the Expos win the series?
The correct answer is 70% and I will explain how to get there.
The Expos batters get on base 13 times per game, with 26 batting outs, while the Expos pitchers allow 12 runners to get on base. In other words, of the 25 runners, the Expos batters generate 52% of them (13/25). The Spiders are the flip side, placing 12 and allowing 13.
In a head-to-head contest, what happens? If Expos generate 52% of the baserunners against an average team, then they’ll naturally generate more runners against the Spiders. Whether you use the Odds Ratio, or the simpler “Strat-O-Matic” method (52+50-48), the answer is 54%. So, right now, we have the Expos batters reaching base 13.5 times and the Spiders reaching base 11.5 times. That’s 54% of the runners being from the Expos. 13.5 times reaching base is an OBP of .342, and that’s the Expos OBP. The Spiders OBP is .307. The ratio of OBP of Expos/Spiders is 1.1145.
Runs scored to allowed is proportional to the square of the OBP ratio. So, the square of 1.1145 is 1.242. That’s 1.242 runs score per runs allowed.
Wins to losses is proportional to the square of the Runs ratio. So, the square of 1.242 is 1.543. That’s 1.543 wins per losses (or a .607 win%).
Win/loss ratio of a 7-game series is (roughly) proportional to the square of the Wins ratio. So, the square of 1.543 is 2.38. That’s 2.38 series wins per 1 series loss. And that’s 70%
And so, when one team generates 52% of the runners facing another team generating 48% of the runners, then over a 7 game series, the better of such teams will win 70% of the time. That’s the sliver of difference between teams, when those scoring confrontations get to compound into a series win.
Bill has a tremendous article showing that batting averages bias MVP voting to a pretty large extent. Now, if you wanted to determine the EXTENT of the bias, there's a path there. While Bill used Win Shares as his central point, he did also do a quick overview with WAR, which is what I'll focus on here.
First, figure out how many hits above (or below) the league average the hitter has (using AB as your opportunity number). For example, if you have a .360 batting average in a league of .260 with 600 at bats, that's +.100 x 600 = +60 hits. You can now run a regression, but you can do a trial and error process as well, which is probably going to be more instructive. Give each extra-hit 0.01 WAR. So in the above example, you are giving +0.6 WAR. Go back to Bill's study, and now look to see if the bias still persists. You will probably not notice much difference. Try again with 0.02 WAR for each extra-hit, then try 0.03, then 0.05, and then 0.10. You may iterate downwards as you may have overcompensated.
So what we are doing here is building-in the bias into the model, so that there is no bias in the output. Once you have something close to that, then congratulations, you have now figured out the extent that batting average biases the MVP voting.
This one is for all you math teachers out there, who like to use sports for your examples.
The math is basically straightforward, so let me go through an illustration. Looking at each half-inning through the first 8 innings in 2019, 71% of half-innings were scoreless. And therefore, for both half-innings to be scoreless (and therefore still tied) is 71% times 71%. That’s 50%.
The percentage of half-innings (in regulation) where exactly one run was scored was 15%. And so 15% times 15% is 2%. Therefore, we’re at 50% tied with 0-0 scored in the first extra inning, and 2% tied with 1-1 scored in the first extra inning. After that, we get into rare scenarios. Add it all up, and there’s a 53% chance of the game being tied after the first extra inning. So, that’s what we’d expect, if extra inning scoring patterns followed regulation scoring patterns.
In 2019, 55.9% of extra innings were still tied after the first extra inning, and since 2010 it was 55.4%. Therefore, our expectation using regulation scoring was pretty close to reality.
The other thing we notice with scoring is the following: the chance of exactly 1 run scoring in a half inning in regulation is 15.3%. Scoring exactly 2 runs is 7.5%, or about half as much as 1 run. Scoring exactly 3 runs is 3.5%, or about half as much as 2 runs. And so on. Therefore, we could use this shorthand:
When we use this model, we still get 53% still tied after the first extra inning.
We can apply this kind of thinking with a runner on 2B and 0 outs. The historical chance of a runner scoring in this situation is a bit over 60%. That means the chance that this lead runner does not score (ergo, a scoreless inning) is 40%. So, we end up with this presumed model:
Since we now have a presumed distribution of run-scoring, we can figure out the chance of an inning being tied. So, that would be 40% x 40%, plus 30% x 30% plus 15% x 15% and so on. That gives us 28%. Therefore, I’d expect around a 28% chance of the game still being tied after the first extra inning. JJ Cooper of Baseball America reported that in the minor leagues in 2018-19, it was 27% still tied.
Therefore, I’d suggest this basic model works well enough. And it opens up the door to figuring out the chance of the game still tied after two extra “accelerated” innings: 28% x 28% = 8%. And the chance still tied after three extra “accelerated” innings at 28% to the power of 3, or 2%. And on and on.
This is for the bottom of the 10th or later innings, with a runner placed on second base.
This is how to read each line:
If you need a really really quick shorthand:
So, just remember “80% chance of winning” for the home/batting team tied, and then keep dividing by 2 for each run.
I posted on Twitter something that is common knowledge among the saber folk, the 10:1 runs:wins relationship.
What is somewhat common knowledge is the relationship of bases and outs to Runs scored. Bill James taught us that with Runs Created, as a function of OBP to SLG.
What might be less common knowledge is how wOBA fits into this. wOBA is scaled to OBP and is proportional to SLG. And therefore, wOBA squared is proportional to runs scored.
However, when we talk about individual players, we really prefer to report in terms of wOBA and not wOBA squared. That’s because, at least for hitters, their impact to a team follows a linear approach, not a squared approach. This is why Linear Weights, not (the basic version of) Runs Created is preferred. And this is why a Runs Created approach that goes through a “theoretical team approach” is preferred. In other words, we can apply the Runs Created concept, but with about 8/9ths of it being linear. I hope that made sense.
So, if we want to know about how talented a team of batters is, we’d average their wOBA, not their wOBA squared (aka Runs). At the individual game level, it gets even worse, because that squared approach will really make larger the impact than it is. In other words, there’s a certain level of “running up the score” because of the way baseball is built.
And so, I thought: why don’t we take the square root of the runs scored and runs allowed? And then take the difference? And wouldn’t you know it: it’s (slightly) better than taking the actual difference in runs scored. I looked at the 660 team-seasons since 1998: 371 teams were closer to their actual W/L record following the Square Root of Runs (Root Runs) approach, while 289 teams were closer using the straight Run Differential approach. That’s 56% to 44%, which is fairly resounding as far as these things go.
The one place I’d be a bit worried, but not too much, is how it relates to pitchers. Pitcher interact with themselves. And so, you DO want a Runs (or wOBA squared) approach. However, adding that up at the game level probably hurts more than it helps. In other words, things get exaggerated at the game level and so, it might still work out going with a wOBA (or Root Runs) approach.
Anything more, and that’s for aspiring saberists to tackle. Actually, the veteran saberists should as well. This is not as obvious as it looks.
Good work from Jim here, if you focus on the things he'd doing, and not jump to any overall conclusions. Think about the title of this thread: evaluating the PLAY v attributing the influence of the player on the play. It's going to explain why you see the results you see, and why you should be careful with the conclusions. I talk about this dozens of times, and it'll save you alot of head scratching if you can keep remembering this.
Eventually, once we roll out Layered Hit Probability, it will all make sense.
This is an expansion of a twitter post I made, though it will not not be fully expanded. Indeed, I have no idea how long I will take to write this blog post, but I expect it will be less than ten minutes.
Bill James had an article a few years ago saying that you could summarize the history of the Roman Empire in one sentence. Or one paragraph. Or several paragraphs. Or a book. Or an encyclopedia(*). In other words, however deep you wanted to go into the abyss, we could go.
(*) Assuming you know what that is.
First thing you want to know is the distribution of the talent level of the teams. Only god knows. But, we can infer it based on observations. If we observe that the win% based on 162 games is one standard deviation of .072(**), then the TRUE distribution is .060. We get that as:
.072^2 = true^2 + random^2
Where random is .5/root(N), where N = 162, and .5 is the root of p*q, where p is the average win percentage of .5 and q is 1-p.
(**) which is the historical average at some point, and I don’t know what it is more recently, though it can’t be that much different if you look at it over a few years)
So, we can reasonably estimate that in MLB the true talent distribution at the team level is one SD = .060 (***). To figure out the difference in talent between two random teams in this distribution, it is simply root 2 times .060 or .085.
(***) Knowing that, you can ALSO estimate the talent distribution at the player level! That’s another blog entry.
Knowing the standard deviation is one thing. What we want to know is the average difference. And roughly speaking, that is about 80% of the standard deviation. So, .085 x 80%. Therefore, the average difference is just under .070. Then we have the home site advantage, which lets worse teams beat better teams (but also allows better teams to not let random variation beat them). In MLB, with the home site advantage at about 54%, it doesn’t really change much, pushing it above .070.
And so, in MLB, if you have two teams, and you KNOW which is the more talented team, then the more talented team will win 57% of the time. On average.
?Suppose you had a ranking of starting pitchers, and you wanted to convert that into "points". How would you do that? You might be tempted to look at your 150 SP and give 150 to the first place pitcher and 149 to the second, all the way down to 1 for the 150th. That would however imply that the value spacing between each pitcher matches exactly to the ranking spacing. But we all know the gap between 1 and 21 is much larger than the gap between 101 and 121.
So what I do is first have some idea as to what that spacing should be. And for that, I turn to Weighted Enhanced Game Score
(Click to embiggen)
On the left chart is all the data points of the pitchers, which you can see is decidely not linear. In fact, that line follows a log function of about: 65 minus 4 * ln(x). You might be afraid of that ln(x). On the right, I changed the x-axis from Ordinal Ranking (1 to 150) to the ln(Ordinal Ranking). ln(1) = 0 and ln(150) ~= 5. Once you see the data laid out like that, you can now see a pretty close to a straight line. In other words, to convert a non-straight line into a straight-line, you need to apply some function to your x-axis. In this case, ln(x). Other times you may need to apply an exponential, or a quadratic equation, etc. (Sometimes, you won't get so lucky.)
While this function is 65 minus 4*ln(x) as the best fit for the top 150 SP, the AVERAGE of that is 49. Therefore, since the only purpose of using Game Score here was to give me an idea as to the relationship, I can tweak it to 66 minus 4*ln(x) so that I can an average Game Score of 50. The nice property of ln(1) = 0 is that the intercept value (66 in this case) ends up being the value of my #1 guy. And a 66 Game Score for the top pitcher is about a reasonable number.
When the Marlins made their comeback against the Cubs in the 2003 playoffs, I described how WPA worked on a play by play (and in the case of the fan, pitch by pitch) basis. A couple of weeks ago, I described Layered Hit Probability, all the various layers we have to go through in order to explain the how/why that a play happened.
?Sam Miller lays it all out with what we are up against if we try to go to the ultimate, and describe all the baserunning and fielding involved in a play. And he makes the salient point:
To give credit on all of them means building statistical systems that can make assumptions that hold true in as many cases as possible -- and that don't require hours (and that don't rely on personal opinions) for each of them.
What Sam did is identified what we call Action Events. At every Action Event, we stop the play, and understand the landscape. We identify what is the run potential (actually win potential) at that point in time. Then we fast forward to the next Action Event and ask the same question. And we capture that change, and assign that change to the change agent(s) between the two Action Events. And on and on we go, much like I described with the Marlins/Cubs, but far more in-depth, as Sam has done. With the key point that we make sure it all adds up, as Sam showed.
And once we have it all broken down for all plays in an inning or a game or a season, we can tally it all. You can see it in the Cubs/Marlins:
The tally:
Prior + SS = +.076
Prior + Alou = +.051
Remlinger = +.001
Remlinger + Fielders = -.016
Dusty = -.017
Fan = -.031
Prior = -.051
Gonzalez = -.184
Farns + OF = -.271
Prior + OF = -.476
Manager = -.017
Fan = -.031
Pitchers = -.368
Fielders = -.502
TOTAL: -.918 (.018 – .936 = .918)
And the kicker is going to be, that once we have a Statcast WAR, that we may be able to explain the PLAYS, we may be introducing a bunch of random variation into a PLAYER. We'll be taking three steps forward on explaining baseball, but we may be running in place in explaining a baseball player. This is why FIP has such a strong footprint, taking the bird's eye view in explaining a baseball player. You have to be careful in conflating the IDENTITY of the players involved in a play, with the INFLUENCE of the player (as opposed to the effect of random variation). And this gets into the bittersweet symphony of explaining baseball, which I tried to describe in this two-part thread from a while ago.
?I ran a series of polls of the Straight Arrow voters among the 9 player candidates, along with intermediary results, which are close to the current results. To read that, it says that if you were to select ONE player, and one player only, Lou Whitaker would get 34% (using that link) of the votes. It's currently at 35%, and I'll use the most current results for this blog post.
So, you can see that in a "must 1" balloting process, if the threshold is 75%, no one would get inducted. There's too much vote splitting. Ah, but what if it was a "must 2" balloting process? What if everyone had to select 2 players? Could we figure that out? Yes!
Warning, math ahead.
Let's start from the perspective of Lou Whitaker. In a must 1, we already know that 65% did not select him. Of those 65, 12 of those was Don Mattingly that was selected. So, if we look at the 8 remaining (so, Whitaker, and the other 7, not Mattingly), we take Whitaker's strength (353) and divide it by the remaining strength (1000 minus Mattingly's 119) to get 40.1%. In other words, if Mattingly is off the board, then Whitaker will appear on 40% of those ballots, as the 2nd candidate. We repeat this for each player, and Whitaker will appear from 40% to 36% as the second candidate.
And how often do each of those happen? Well, we weight it by the 1st ballot voting rates. For MAttingly that's 12% and for John and Murphy that's 10% each, and so on. And when we do that, we get 39% for Whitaker. In other words, given that Whitaker was NOT the first selection on the ballot, he will be the SECOND selection on the ballot 39% of the time. And since he was NOT on the first selection 65% of the time, we take 65% times 39% and we get 25%. Whitaker appears on 25% of the ballots as the #2 candidate. We already know he appears on 35% of the ballots as the #1 candidate. And so, Whitaker, in a must-2 selection process, will appear on 60% of the ballots.
Mattingly, who was 12% on a must-1 process, is now at 25% on a must-2 process. Garvey, 2.7% on a must-1 is now 6.1% on a must-2.
End of Math
Ok, so this is how it works. I will now turn it over to the aspiring saberists. First figure out the strength values. If you don't know how to do that, then just use what I posted on Twitter. Secondly, repeat what I did for a must-1 and must-2. And then show us must-3 and must-4 and must-5 and must-6.
You'll have plenty of fun if you are a math enthusiast.
?Mike's got you covered with a brilliant layout of the relay heard round the world.
The next step is to figure out if it was worth it to even test the Rays. And it was not.
First, let's lay out the numbers.
Which means
Batting team gains .020 wins if safe, loses .102 if out
Breakeven: 84%
.718 x .90
+ .840 x .10
= .730
Home team (defense) win expectancy goes from .738 to .730 in this case
.718 x .80
+ .840 x .20
= .742
Home team (defense) win expectancy goes from .738 to .742 if you send him when you are at best 80% sure.
Why the high breakeven?
The difference between this situation, and most others, is that you do NOT want to make an out at home with 0 or 1 outs. That's because of the power of the potential sac fly. So, that's why you have to be really really sure that the runner will be safe. At least 84% sure. That's Tim Raines trying to steal a base, that's the confidence you need.
And when a runner is thrown out by this much, you know that it was not an 84% chance of being safe. The only way for Altuve to be safe is if the relay was not perfect. And with two guys throwing, that probably sets it at 50% chance of that happening. With two outs, this is the ideal send. With 0 or 1 out, it is not.
?
?About six months ago, in introducing a simple way to create the Catcher Framing metric, I also showed how to quickly test for park bias in that metric. It actually can apply to any metric. In any sport.
Let's apply this concept to the exit speed of a batted ball. The key to the concept is that we presume no relationship in talent between the home batters (and opposing pitchers) compared to the away batters (and home pitchers). What we do is for each park we figure the average exit speed for the home batters (or the bottom of the inning) and the away batters (or the top of the inning). In Fenway 2019 for example, the exit speed on the bottom of the inning was 90.7 mph (or +1.9 mph above league average) and in the top of the inning it was -0.1 mph from league average. We repeat this for all 30 parks, for the five years of Statcast.
If there is no correlation at all, and there shouldn't be based on our assumption of fact, we'll get an r close to 0. If we do get a larger correlation, that would point to some sort of park bias. That bias could be the tracking system. It could also be the players responding to the peculiarities of the park. And what do we get? r=0.06. In effect, an r close to 0, and therefore showing no park bias.
Aspiring saberists can use this technique, in any of the sports, to look for biases in metrics, whether measured like I am doing here, or calculated, as I did with the Catcher Framing.
?
A few weeks ago, I ran a poll asking the style of player that fans preferred. Overwhelmingly, fans preferred Strawberry to Mattingly. Strawberry represents the three-true-outcome style (lots of HR, lots of walks, lots of strikeouts, not much in batting average). Mattingly is the opposite: a decent number of HR, not much walks, not much strikeouts, way high in batting average.
Overall, they had a quite similar effect on run generation. If you focus on their stats through their 20s, Mattingly came to bat 4851 times to Strawberry's 5137. So, a 286 PA advantage for Straw. Straw had 242 fewer hits, but 313 more walks, 111 more HR, and 827 more strikeouts. The rate stats tell the story more clearly as to their profile. Here are their BA, OBP, SLG
In other words, similar OBP, similar SLG, and a whopping difference in batting average. Is it better to have a low or high batting average?
Well, we can turn to wOBA. And Standard wOBA is .375 for Strawberry and .372 for Mattingly. In other words, the huge gap in batting average was inconsequential: we can describe their overall production (via wOBA) as being similar, which also matches their simlarity in OBP and SLG.
So, that let me to my poll question:
Which is the more productive hitter
A.
B.
Player A is the Mattingly, Dave Parker type. Fred Lynn and Jim Rice too if you wish.
Player B is the Strawberry, Mike Schmidt type. Eric Davis (without the speed) if you wish.
We can construct hitting lines of 700 PA as follows:
Those lines gives us identical OBP/SLG of .364/.511, which I am arguing (not really arguing, really stating as fact) is identical production, even if one guy has a .315 BA and the other has a .260 BA.
Indeed, their Standard wOBA is .379 for Quint Mattingly, and .378 for Quint Strawberry.
How does it happen that the tradeoff is basically even? Quint Mattingly has 49 more singles while Quint Strawberry has 49 more walks. In terms of run production, that gives Mattingly 7-8 more runs. Quint Mattingly has 12 more doubles to Quint Strawberry 12 more HR. That gives Strawberry 7-8 more runs.
In other words, giving away 49 singles for 49 walks is balanced by getting 12 more HR for 12 fewer doubles. Or if you wish -4 singles, -1 doubles = +4 walks, +1 HR. That's the tradeoff.
And that's why Quint Mattingly = Quint Strawberry. And that's why the batting average is inconsequential. And that the vast majority of voters used the higher batting average as essentially the tie-breaker is why we should stop talking about batting average. It's a bias that clouds our view of players.
Also: check out the take from Ben at Fangraphs.
I too thought it was ridiculous that one of out eight men thought they could.
But, the key is the competition setup. Is it a one-shot deal? Then, yes, that is totally laughable.
But, if this was a 2-set match? Things are different. This is where we rely on good luck. A game is made up of at least 4 points. (The way tennis is setup, you need to get 4 points and win by 2 in order to win a game.) You need to win 6 games to win a set, and two sets to win a match. And serves alternate.
So, for Serena to win the match on a shutout, she has to score 48 points to my zero. That would mean scoring 24 points by her serving, and she needs to score 24 points on my serve.
The only conceivable way for Serena to not score on her own serve is for her to double-fault. The chance that she would do that against me is probably 1 in a thousand. Or 99.9% she won't double-fault on any serve. So, .999^24 she won't double-fault, or 97.6%.
What is the chance she won't return my 24 serves? Let's see, she'll get 12 points simply because I'll double-fault. In the next 11 serves, she'll hit them back 99.99% of the time. And on the last serve, I'll Nick Kyrgios an underhand serve. She'll return that one 99% of the time.
So, she'll return my serve .9999^11 x .99 = 98.9%.
And 97.6% x 98.9% = 96.5% chance that Serena will get a shutout.
So, I think I have a 3.5% chance of scoring one point, if I'm given 48 opportunities to do so. That's 28:1 odds.
This would mean that I would be willing to bet 1000$, with the chance of winning 28,000$. And I think there's no way I would do that. You can do all the math I did, but the reality is that if I'm laying out 1000$ that I can get one point out of 48 tries on Serena, I'd expect at least a 100,000$ payout. That's 100:1 odds. That means I really have a 1% chance of getting a point. And I think that's probably being optimistic.
?Something interesting happens with sports: opportunities are NOT handed out randomly. This little quirk actually is fairly critical. You see, the way Regression Toward The Mean (or the better term, Reversion To Form) works is that you need to know the population mean. But, since most of the playing time is given out to the better players, those players get more weight when you calculate the league average mean. What we actually want is the unweighted average of our population.
Something interesting happens when you do that. The classic way is to treat the league average mean as the population mean. And so, you would provide 200-300 PA of "Ballast" for your prior, the amount of weight to the population mean, to add to your observations for each player, to come up with the posterior, the True Talent Level of each player.
But, if instead of the actual league average (which is overweighted to the better players) you had the simple league average, an unweighted population mean, the amount of ballast is going to shrink considerably, probably under 100 PA. The population mean will also go down. Instead of say a .320 wOBA, it'll go down to say .300, maybe even lower. For guys with 700 PA, things won't change much. For example, .400 wOBA observed on 700 PA will give you a True Talent estimate of .379 with 250 PA of .320 wOBA ballast. But with 100 PA of .290 wOBA ballast, it's a .386 estimate.
Where it's REALLY going to matter is with guys with few PA. Those guys with the classic method would give you .320. And with this new method will give us .290. And naturally, the newer method is better. After all, the reason they have so few PA is BECAUSE we know they are below average hitters. We can't just ignore that critical piece of information.
So, this is for aspiring saberists to focus on this framework to come up with the better estimates, the better process, than just to rely on the league mean to represent the population mean.
***
This is also the idea behind the WARcels by the way. If you think about what I did, and why I did what I did, you will see that I did apply regression, but I did NOT use league average. I in fact, implicitly, used the replacement level. The true population mean is going to be somewhere between the replacement level and the weighted league average mean.
?The always fantastic Probability Jock Kincaid gives us a primer, with carefully constructed scenarios. You can especially see the care he takes when he notes that one SD = .072 in talent using historical data, then switches to one SD = .060 for the more recent decades.
Anyway, I loved the way he started, by going with purely unweighted coin to totally weighted, then choosing in-between.
Ballast
?Regression Toward The Mean (RTTM) is an important concept, a critical concept. But boy what a terrible name. Michael Lopez proposed Reversion To Form, which is a definite improvement. Regression has its own non-statistical definition, while Reversion really is about resetting the expectation from the observation. And To Form, as opposed to Toward The Mean is also better, as RTTM makes it seem that the player's talent is changing toward the population mean. Reversion To Form is really about setting our expectation of his talent level given the observation.
Bill James since the 1980s (and I can attest to this, since I remember everything he wrote, and I started reading him in the 1980s) has always used the term Ballast. In a nod to his brilliance, he knew he needed to do something, without the concept of Bayes being at the forefront. Which is why Bayes is beautiful, because we all use it, even without formality.
Priors
In order to establish the true rate of something based on the observation of that thing, we need to know something about the population that this thing was drawn from. This is called your Prior Distribution. The Prior requires both the mean and an amount. For something like OBP, the mean is the league mean, say .330, but more importantly is the regression amount. This is what Bill James calls Ballast. It's how much your observation of a player needs to be pulled toward the population of all players. For OBP, historically, it's in the 200-300 PA range. For something like K/PA, it's much lower, while for something like BABIP, it's much higher. In other words, the amount of Ballast you need is linked to how much that observation tells you about the player.
In sports, the skill that requires the least amount of Ballast is free throw shooting. As a general rule, the less "layers" there is between the physical effort required, and the end-result, the less Ballast you need. For free throws, there's the player at the free throw line, and the basket. There's no defense, there's no varying distance. In addition, because players are not chosen on their free throw skills (think Shaq), there's a naturally wide talent base to choose from. The wider the talent base, the less Ballast needed as well.
For things like BABIP, it's a crucial skill for a pitcher, so they are selected for it. (If a pitcher is hit too hard, he won't make it to MLB.) So, we already have a tight range. But in addition to that, you have the batter, the park, and the fielders. There's alot of layers to get through from pitcher physical skill to outcome. We need alot of Ballast.
Swing Rates
How about swing rates? We break up the area at the plate into four regions:
A hitter has a hitting approach. A hitter does not really change his hitting approach, since that hitting approach is what has brought him to MLB in the first place. However, he will tinker, and as the years go by, he will start to adapt. So, while we suspect we're going to need more Ballast than free throw shooting, we also think he won't need as much as with Strikeouts.
Heart of the Plate
Let's look at the data. For pitches in the Heart of the Plate, hitters will swing 70-75% of the time, with one standard deviation being about 6% (among our sample hitters, who averaged 480 PA). Technically, I should be using number of pitches, not PA, but PA is an easier standard if comparing to other skills. As it so happens, there's about a 1:1 relationship between number of pitches in Heart of Plate and number of PA.
Anyway, since one standard deviation is 2% and our observed is 6%, that gives us a z-score of close to 3. Our Reversion To Form (or Regression Toward The Mean) is 1/z^2, or close to 12%. The Ballast (or Regression Amount) is .12/.88*480 = 65. In other words, we need to add 65 PA of Ballast to an observed swing rate for pitches in Heart of the Plate.
Shadow, Chase, Waste
Shadow Zone also requires about 65 PA of Ballast. However, since there are 1.7 pitches per PA in the Shadow Zone, the amount of Ballast of Pitches is closer to 40 Pitches.
Chase Region requires close to 40 PA of BAllast, or about 45 Pitches of Ballast.
Waste Region is 45 PA of Ballast or 125 Pitches of Ballast. In other words, how a batter swings in the Waste Region is not as indicative of his approach in the other regions.
For the sake of simplicity, let's add 50 PA of Ballast for each Region.
Adjacent Regions
Now, we can also learn about each Region by looking at the other three regions. After all, how a hitter approach the Heart of the Plate can be informed by how he approaches the Shadow, Chase, and Waste regions.
As it turns out, the weighting is close to:
The other regions are more iffy. Guys like Votto, the less you Chase, the more you swing in the Heart. For guys like Baez, the more you Chase, the more you swing anywhere.
Repeating for the other three regions, and we have the following for Shadow:
In other words, the surrounding regions inform alot, but Chase is more indicative of the approach to Shadow, than Heart.
Chase:
Chase is fairly equally informed from the other two.
Waste:
So, there you have it. In order to establish the skill level of a hitter at swinging in each of the 4 regions, you apply a 50 PA Ballast, along with the weighting of the adjacent skills. The aspiring saberist can of course focus more on pitches than PA, and be a bit more rigid in their approach. And especially focusing on players who might have a new "established" change in hitting approach.
?Here we go. If you want to see statistics being misapplied, look no further than what you will read about SAT.
Any adjustment made is going to be biased in some form or other. I'll make the comparison with hockey, so as to not offend anyone. At some point, and probably this is still true, half the first round picks were Europeans. But, much less than half of NHL players are Europeans. Why is that? Because the NHL goes after the best European players. Once you get down to the 3rd and 4th line players, there's a cost/benefit applied: what does it cost to scout and bring over a European player, when there's someone almost as good in Canada and USA? In other words, Europe sends over disproportionately their best players, compared to what Canada and USA sends. That means that the AVERAGE European player in the NHL would have to be better than the average Canadian or American player. It's a selection bias.
If you just look at the first wave of Russian players in the NHL, you'd think that Russia only had hall of fame caliber players. Mogilny, Bure, Federov... Larianov, Makarov, (Krutov), Fetisov, Kasatonov. Again, selection bias.
How do you know you don't have a selection bias? When the average of that class is the same as all the other classes. LHP v RHP? If you check, they will have the same ERA and same FIP. (I haven't checked in many many years, so if it is not true, then we have a market inefficiency.) MLB players born in California compared to NJ? They should have the same WAR per PA and WAR per IP (though Trout is going to break the rule, so be careful with sample size too). If not, there's a market inefficiency. (Or like NHL, a cost/benefit that crystallizes this inefficiency.)
So, when you look at the SAT, be very careful especially for selection bias.
Nov 23 14:15
Layered wOBAcon
Nov 22 22:15
Cy Young Predictor 2024
Oct 28 17:25
Layered Hit Probability breakdown
Oct 15 13:42
Binomial fun: Best-of-3-all-home is equivalent to traditional Best-of-X where X is
Oct 14 14:31
NaiveWAR and VictoryShares
Oct 02 21:23
Component Run Values: TTO and BIP
Oct 02 11:06
FRV v DRS
Sep 28 22:34
Runs Above Average
Sep 16 16:46
Skenes v Webb: Illustrating Replacement Level in WAR
Sep 16 16:43
Sacrifice Steal Attempt
Sep 09 14:47
Can Wheeler win the Cy Young in 2024?
Sep 08 13:39
Small choices, big implications, in WAR
Sep 07 09:00
Why does Baseball Reference love Erick Fedde?
Sep 03 19:42
Re-Leveraging Aaron Judge
Aug 24 14:10
Science of baseball in 1957
THREADS
October 04, 2024
Binomial fun: Best-of-3-all-home is equivalent to traditional Best-of-X where X is
June 06, 2024
Bias in the x-stats? Yes!
April 02, 2024
Bayesian inference: How much new information is contained in a streak?
December 28, 2023
Improving WAR - Re-solving DIPS (part 2)
December 28, 2023
Improving WAR - Resolving DIPS (part 1)
July 31, 2023
Should Miguel Cabrera have swung at an intentional ball?
June 10, 2023
Poisson Infractions: Fun with Poisson and Pitch Timer Infactions
May 22, 2023
How bad will the A’s be?
March 20, 2023
What is a baserunner?
January 15, 2023
How good are our Win Probability Models? Close to perfect
January 15, 2023
When does past BABIP give us enough signal it overtakes past FIP?
January 14, 2023
When does past ERA become more predictive of future ERA than past FIP?
December 04, 2022
Spray Angle overfits xwOBA
October 14, 2022
Blast from the past: Fielding Aging Curves
October 10, 2022
When does a one-game and a 15-game playoff series become equivalent?
Recent comments
Older comments
Page 1 of 151 pages 1 2 3 > Last ›Complete Archive – By Category
Complete Archive – By Date
FORUM TOPICS
Jul 12 15:22 MarcelsApr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref
Apr 12 09:43 What if baseball was like survivor? You are eliminated ...
Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method)
Jul 13 10:20 How to watch great past games without spoilers