WAR
Saturday, December 25, 2021
Baseball players accumulate bases and outs. Bases and outs are the building blocks to runs, which are the building blocks to wins.
Runs is a very easy checkpoint. We all understand what scoring 5 runs and allowing 4 runs means, or scoring 6 and allowing 2. And assigning value to players in the form of runs is essentially the easiest target. The actual currency however is wins. Because even more than knowing if a team scored 700 runs (RS) and allowed 600 runs (RA) in a season, we really, only or at least ultimately, care about their wins and losses. And so we need a way to translate runs into wins.
The simplest most basic rule is that every additional 10 runs at the seasonal level leads to 1 win. In other words, the runs-per-win conversion is 10:1. While this is a very basic and useful rule of thumb, it is only true in the “average” situations. The more extreme you get, the more this is not true. In a low run scoring environment (think Pedro on the 1962 Mets at the Astrodome), the number of runs it takes to convert runs to wins is much lower. And in a high run scoring environment (think of the worst pitcher with the 1927 Yankees at Coors), the number of runs to wins is much higher. In addition, the larger the gap in talent between the team and their opponent, the higher the runs per win conversion.
So, we can come up with a better rule of thumb:
Runs Per Win
= 3.0
+ 0.8 * (RS + RA)
+ 0.4 * (RS - RA)
For example: If you score 5 runs and allow 3 runs, then the RPW is 10.2, or 3 + 0.8 * 8 + 0.4 * 2. It works spectacularly well in the cases I tested, of runs scoring per team, from 2 to 6, with an error in RPW of less than 0.2 for any combination of RS and RA. (RS - RA is the absolute difference)
If you are interested in more math, and you really really really have to love math, you can jump the line.
Read More
(5)
Comments
• 2021/12/28
•
WAR
Friday, December 24, 2021
WAR for Non-Pitchers
WAR = PA*(wOBA-0.277)/12.1
Plus adjustments for:
- Year
- League
- Park
- Runs Per Win
- Opponent quality
- Fielding
- Positional Adjustment
- Leverage Situation (base, out, even inning, score)
- Replacement Level
WAR for Pitchers
WAR = 0.0572 * IP - ER/10
Plus adjustments for:
- Year
- League
- Park
- Runs Per Win
- Opponent quality
- Fielding support
- Role Adjustment (SP v RP)
- Leverage Situation (base, out, inning, score)
- Replacement Level
I'll update this thread as the mood strikes me…
Update: The mood has stricken me, so all the above points now have links.
Last year, I created NaiveWAR, which used none of the above, but opened the door for using all of the above. You can check out how I applied the concept of WAR in conjunction with Individualized Won-Loss Records (The Indis) to the 2008 Orioles, to ensure that the sum of all the players added up to 68 wins and 93 losses, the actual W-L record of the Orioles. This 2008 Orioles thread is an excellent jumping off point, where you can start here with NaiveWAR, then start to add in all the above layers to come out with an AllOutWAR.
For those who don't want to read all the above links, I had this other thread last year that gave a high level view of most of these concepts.
(8)
Comments
• 2022/01/07
•
WAR
Tuesday, November 16, 2021
In 2021, deGrom was cruising into an easy Cy Young win, when he was stopped at 92 IP, and a 1.08 ERA.
Julio Urias had an excellent 2021 season, with 183.2 IP and a 2.96 ERA.
The difference between the two pitching lines is 93.2 IP, and a 4.80 ERA.
The question on the table is thusly: IGNORING the post-season, if you knew what their 2021 seasons would give you, which season would you have preferred for your team?
When I asked that question of my followers, it landed almost 50/50: 46% preferred deGrom, 48% preferred Urias, and 6% said it was too close to call. Even with the option of a tie, 94% still felt strongly enough to favor one pitcher over the other. In other words, most people think there’s no way that you can balance quality and quantity, though they are equally divided on whether quality or quantity should take preference.
Fast-forward to the answer: the Crowd called it right, it is too close to call, even if only 6% of the Individuals thought it was too close to call.
***
What can we do to get the actual answer?
We’re going to represent IP and ERA into Wins and Losses. And I’m going to do this step by step by step, in excruciatingly detail. Or, if you like Math, it will be in loving detail.
Here we go…
We’ll make our first assumption that both pitchers pitched in similar environments. We’ll say that the average opponent would score 4.5 runs per 9 IP. So, given 92 IP, that suggests the environment would allow 46 runs for deGrom. Since the Mets actually allowed 14 runs with deGrom, that’s a whopping 32 runs saved by deGrom compared to the average.
Doing the same for Urias, his 67 runs allowed is compared to the average baseline of 93 runs, and so Urias saved 26 runs, compared to the average.
So, that ends it right? deGrom saves 32 runs compared to the average and Urias is 26. Except, the average has value. Being a .500 pitcher who makes up the 93.2 IP difference is worth something, even if that is 0 runs above average. Average therefore can’t be worth 0, it has to be worth more than 0. The baseline level therefore has to be BELOW average. But, how much below average?
Let’s continue…
deGrom had 15 starts which would suggest around 135 innings needed. Since deGrom had 92 of those, there’s another 43 innings that need relief. Applying our average pitcher and their 4.50 RA/9, and that adds 21.5 runs. So in deGrom games, their team allows 35.5 runs, or 2.37 runs per game.
Is it a leap of faith for you to accept that a team that scores 4.5 and allows 2.37 runs per game will win about 75% of the time? Leap over the next paragraph in that case. Or read the next paragraph.
Well, I promised the details, so if you need them, here we go as well. 4.5 plus 2.37 is 6.87 runs per game. If you raise that to the power of 0.28 you get 1.72, which we call the PythagenPat exponent. 4.5 divided by 2.37 is a runs ratio of 1.90. And the runs ratio raised to the PythagenPat exponent is 3.0, or the win ratio (wins divided by losses). And if you win 3 games for every one you lose, that’s a .750 win%. And in 15 games, that’s 11.3 wins and 3.7 losses.
Repeating all the same steps for Urias and his team, we get a .588 win%, or 18.8 wins and 13.2 losses.
Here’s where we are so far:
- 11.3 W, 3.7 L, 15 Games, deGrom games
- 18.8 W, 13.2 L, 32 Games, Urias games
***
A good chunk of all that has nothing to do with our pitchers. So, we need to back out all the non-deGrom/Urias contributions.
First, we remove the non-Pitcher contributions. About 4/7ths of the contributions come from the non-pitchers, which means 8.6 games for nonpitchers in deGrom games. That leaves us with 6.4 games to pitchers in deGrom games. Since deGrom pitched 68% of the innings, the other 32% of the 6.4 games goes to the relief pitchers, or 2.0 games. And 8.6 plus 2.0 means 10.6 games are assigned to non-deGrom players in deGrom games. And since those are average players, that means 5.3 wins and 5.3 losses are assigned to non-deGrom.
So we now have this:
11.3 W, 3.7 L, 15 Games, deGrom games
5.3 W, 5.3 L, 10.6 Games, non-deGrom players
===== ====== ======
6.0 W, -1.6 L, 4.4 Games, deGrom
And that is deGrom’s Individualized Won-Loss Record. Or The Indis.
Urias looks like this:
18.8 W, 13.2 L, 32 Games, Urias games
11.6 W, 11.6 L, 23.2 Games, non-Urias players
===== ====== ======
7.2 W, 1.6 L, 8.8 Games, Urias
So, our Individualized Won-Loss Records looks like this:
- 6.0 W, -1.6 L, 4.4 Games, deGrom Indis
- 7.2 W, 1.6 L, 8.8 Games, Urias Indis
Which do you prefer?
Here’s where *you* come in. Let’s say, for you, you want a .400 win% record as where value starts. Anything under a .400 win% and you think that has no value. Even worse, it has negative value. Anything above .400 is positive value.
In that case, you are saying that if you have 4.4 games, then a .400 record is 1.8 wins. And so deGrom at 6.0 wins is +4.2 wins above your zero-baseline. Similarly, Urias is 7.2 wins minus 3.5 wins is +3.7 wins above your zero-baseline. In other words, deGrom contributed 0.5 more wins than Urias. Under the .400 assumption anyway.
However, let’s say you think playing time is important, and you think value starts with even just a .200 win%. That value accumulates above the .200 level and becomes negative below .200.
In that case, deGrom’s 6.0 wins is compared to 0.9 wins (which is 0.2 x 4.4), or +5.1 wins above the zero-baseline. Urias is 7.2 wins minus 1.8 wins, or +5.4 wins above the zero-baseline. In other words, Urias contributed 0.3 more wins than deGrom. Under the .200 assumption anyway.
***
So, what *is* the zero-baseline? Well, we don’t know exactly, but we have a pretty good estimate that it’s around .300 win%. Our expectation is that’s where the value starts. We have many different methods that we’ve used over the last 20 years that points toward .300. Maybe it’s .270, maybe it’s .330, maybe in-between. But it’s somewhere close to .300.
If we go with .300, this is what we get:
- deGrom has 6.0 wins compared to 1.3 wins or +4.7 wins above the zero-baseline
- Urias has 7.2 wins compared to 2.6 wins or +4.6 wins above the zero-baseline
And so, we have deGrom and Urias virtually tied. Indeed, if you follow this process to the decimal, their difference, under the .300 baseline assumption is 0.05 wins in favor of deGrom. Which really means tied.
And since the Crowd pretty much split evenly between deGrom and Urias, then the .300 win% baseline satisfies the Crowd. While individually, the implied baseline was closer to either .200 or .400 (no consensus), as a Crowd they intuitively landed at .300. Which also matches the level we use in WAR.
And that is how we can say that deGrom and Urias both had similar contributions in the 2021 regular season. And since individuals can’t give us consensus, this is also why 94% of folks disagree with that conclusion. Even if as a group, they are 100% in agreement.
Monday, October 04, 2021
Just posting this as a reference to the Twitter thread.
(1)
Comments
• 2021/10/07
•
WAR
Friday, October 01, 2021
I will use Carl Yastrzemski (Yaz) and Willie Stargell (Pops) as my step by step example. I will also rely principally on Fangraphs version of WAR (fWAR) as their presentation is ubiquitous. However, I will also reference the Reference version of WAR (rWAR).
Fangraphs has a fantastic feature where it lets you customize your view for select players. This link therefore shows you the data limited to Yaz and Pops. The key columns are at the end, OFF and DEF.
OFF is the Runs above average for batting and baserunning. As you can see, both players are nearly identical at around +460 runs above average for their careers.
DEF is Runs Above average for defensive contributions. That one is a little tricky, as it includes a positional adjustment. Basically, a great fielding 1B, say I’m Keith Hernandez, is +119 runs above the average fielding 1B. But, the average fielding 1B doesn’t have the defensive impact as the average fielder (SS, 2B, 3B, C, OF). We apply a positional adjustment so that we can compare a SS to a 1B. In the case of Hernandez, the positional adjustment is, by pure luck, -119 runs. So Keith’s defensive contributions is +0 runs above an average fielder.
Derek Jeter on the flip side is -137 runs compared to the average fielding SS. But SS gets a big positional adjustment. In Jeter’s case, that’s +117 runs. So Jeter ends up having a defensive contribution of a modest -20 runs.
In other words, the positional adjustment makes it so that we can put all the fielders onto the same scale, and pretty much assure that the best fielding 1B has slightly more defensive impact than the worst fielding SS.
Getting back to Yaz and Pops
Yaz has a DEF of 0 runs and Pops is minus 185 runs.
So, we have this tally for OFF+DEF, relative to average:
- +463 runs Yaz
- +273 runs Pops
The translation to wins is a bit convoluted if you are looking for precision, but it’s around 9-10 runs per win. In the case of Yaz specifically, it’s 9.25. For Pops, it’s 8.96. So we take Yaz at +463, divide by 9.25 and we get +50 WAA. So, we have this as their WAA:
I’ll take a sidebar here to show you how it looks on Baseball Reference:
Even though each of their implementations of WAR are unique, they essentially agree on their win impact relative to the average player.
We can convert those WAA values into an Individualized Won-Loss Record, what I call The Indis. The process is a bit convoluted, but the basic idea is that we assign game slices to each player. Here’s the basic idea. We give out 4/7ths (0.57) of the game slices to position players and 3/7ths (0.43) to pitchers. In a typical game, there’s about 38 plate appearances (PA). So, if 38 PA gets 0.57 game slices, then each PA gets 0.015 game slices. (Roughly speaking anyway, if you don’t worry about precision levels.)
Yaz had 13991 PA, so multiplying by 0.015 and that means Yaz had 210 game slices. Or 210 games. An average player would therefore have a 105-105 Won-Loss Record. Yaz is +50 WAA. So that means his Indis is 155-55. Doing the same for Pops (135 games) gives us 98-37. So we have this:
Now, how do we compare players like that? Those of you out there that are happy with WAA can continue to use WAA and stop reading. However, suppose we had this:
Wins-Loss
100-100 (0 WAA)
50-40 (+5 WAA)
Now you have a problem. Would you prefer the player with the 100-100 career or the 50-40 career? Who contributed more? With WAA, you have an implicit “zero” point of average (or .500 win percentage). If you are a .490 player, the more you play, the more negative your contributions. That is obviously ludicrous on its face. You can’t have everyone above average! This is where the concept of replacement level comes in. We establish the “zero point”, the point at which the more you play at below the zero point, the more you are a negative contributor. And we establish that point as a .300 win%. If you contribute below a .300 win%, you are a negative contributor. In a practical sense, playing at under .300 will not only limit your playing time, but it will also send you down to the minors. Free agents at this level get paid the league minimum, or they sign minor league free agent contracts.
Getting back to our players. Yaz had 210 games, so his baseline level is .300 times 210, or 63 wins (and 147 losses). Pops is 40-95. We have this as our benchmark for our players:
Now we can get to WAR. First Yaz:
155 Wins (and 65 Losses) Yaz Indis
63 Wins (and 147 Losses) Yaz Baseline Replacement Level
————
+92 WAR
And Pops:
98 Wins (and 37 Losses) Pops Indis
40 Wins (and 95 Losses) Pops Baseline Replacement Level
————
+58 WAR
And there you go, Yaz has a 92 WAR and Pops has a 58 WAR using this crude method.
Fangraphs shows this:
And Reference shows this:
So there you go, this is how you can calculate WAR, or simply calculate WAA.
And the power here is that you can apply this to ANY SPORT WORLDWIDE. I’ve done it for hockey, I’ve done it for basketball. I’ve even shown someone privately how to do it for volleyball (and while I love playing volleyball, I don’t know anything about it). So, soccer, cricket, football, they can all follow my WAR framework. Fangraphs and Reference each have their own implementation of my framework. The real effort is actually doing and producing all the work; so I can get 1% of the credit for what they did, but 99% of the value is actually doing the work. I can draw you the blueprints to a house, but the real value is actually building the darn house and dealing with all the peculiarities that the blueprints will kind of gloss over. So, we should all be thankful that we have fWAR and rWAR so readily at our disposal.
Saturday, February 27, 2021
As we know, WAR is perfect in its framework. Here I’ll describe the five pillars of WAR:
- component driven
- relative to average
- adjust for playing context (roles/responsibilities)
- adjust for playing environment (site/era)
- average relative to a minimal baseline
Let’s take each one at a time.
- Component driven means that you are able to keep splitting up what you are measuring into components. The more you can keep breaking it up into its components, the better. In baseball, that could be batting and baserunning. It could mean batting by swinging v batting by taking. It could mean putting in play v not putting in play. Or any combination thereof. We want components because we want to be able to figure out the contributions of the player to those components. And by contributions I mean influence or impact, not simply incidental. Just because you can identify a player being on the field or even involved in a play doesn’t mean that he had any impact on the play.
- Relative to average is important, as this sets the context. It also keeps in a natural guardrail. You don’t want everyone to be above average. Having this baseline is critical. Whether average is in respect to your teammates and/or opponents depends on what you are doing.
- The playing context further establishes your role and responsibilities within the game. Playing SS or DH is very different. Starting pitcher and reliever is very different. This goes beyond just baseball. Everything I’ve discussed and will discuss applies to ANY sport. You can create WAR for anything, and they all have to adhere to these five pillars.
- The playing environment determines the scoring environment. Coors in the 1990s is different from the Astrodome in the 1970s. Hitting against Pedro is different than hitting against the emergency starter.
- Finally, once you’ve established everything you need relative to the average, we have to assign value to being average. Being average is GOOD. Being average doesn’t mean being 0. It means being 0 above average. The absolute value of 0 is what we call the replacement-level player, or readily available talent. Whatever value that player provides, that becomes our zero level. If a replacement level pitcher allows 6.0 runs per 9 IP (RA/9) and the average pitcher allows 4.7 RA/9, then we set the zero baseline at 6.0 RA/9. And so an average pitcher is worth +0 RA/9 above average or 1.3 RA/9 above replacement.
So there you go. If your model adheres to these five pillars, you’ve met the minimum requirements for being a WAR model. In any sport. Of course, it has to functionally make sense, but at least you’ve got the foundation in place.
Wednesday, December 02, 2020
Disclosures:
WAR is Wins Above Replacement if you are new around here.
And if you are new, you should know that I also spearheaded the development of the WAR framework on my old blog circa 2006, with key insights from the Straight Arrow readers, like Patriot, Guy, Rally, David, MGL, and Ray among others. Some of those names are even real. The foundation of WAR leans heavily on the ideas put forth by my saber-heroes Bill James and Pete Palmer. Always meet your heroes. And the conceptual design is inspired by my forays in Fantasy Baseball, Hockey and Football. If you want to build something of value, put your money on it: it’ll force you to leave no stone unturned.
And I should also point out that while I provided the initial 1% of the work in the form of the framework, the other 99% came from the gang at Fangraphs and Baseball Reference and their partners, notably Sean Smith (an aforementioned Straight Arrow reader) in the form of the implementation. It’s critical to make the distinction between a framework and an implementation.
Let’s get to it.
When creating a model, we have to start with assumptions. Indeed, all models try to describe reality in such a way that it is not as cumbersome as reality, but is useful to reflect reality. And to reflect reality, we start with some assumptions.
Assumption #1: Replacement Level
The first one is that a team of low-cost or minor-league free agents will win about 30% of their games. This has been reviewed by several researchers in different instances (including me), and it’s a pretty solid assumption. You might be able to get it as low as 25% or as high as 35%. So 30% is a reasonable assumption. These are the players that are on the bubble, as likely to be in MLB as out of it. Dewayne Wise is a good example of such a player.
So an average team, a .500 team, is +.200 wins above this replacement-level team. With +0.2 WAR per team per game, then 30 teams at 162 games each comes out to about 1000 WAR to give out.
Assumption #2: Pitcher v NonPitcher Split
About 3/7ths of the contributions towards winning comes from pitchers and the other 4/7ths comes from nonpitchers. In order to think this through, here’s how to get there: an average team will score and allow 4.4 runs per game.
A team with average pitchers and replacement level nonpitchers will score about 3.4 runs and allow 4.45 runs per game. In other words, they’ll score about 77% as many runs as an average team and allow 1% more runs. Such a team will win 38% of the time. These nonpitchers are 0.120 wins below average.
A team with replacement level pitchers and average nonpitchers will score 4.4 runs and allow 5.35 runs per game. In other words, they’ll score as many runs as an average team and allow 22% more runs. Such a team will win 41% of the time. These pitchers are 0.090 wins below average.
With one group (nonpitchers) generating +0.12 WAR and another group (pitchers) generating +0.09 WAR, this means that the pitchers are generating 3 WAR for every 4 WAR that nonpitchers are generating. In other words, 3/7ths come from pitchers.
We therefore assign 3/7ths of the 1000 WAR to pitchers, or about 430 to pitchers and 570 to nonpitchers.
Assumption #3: SP v RP
It is much harder to pitch as a starting pitcher than as a relief pitcher. I can go through the math in the comments section. It gets a bit more convoluted.
In the end, while relievers pitch 1/3 of the innings, they’ll end up with about 20-25% of the WAR for pitchers. So the 430 WAR for pitchers get split as 100 for relievers and 330 for starters.
Implications
This represents the core of the assumptions for the framework. There are additional assumptions made, and we can deal with it in a future article.
So, what is the implication of these assumptions? With 570 WAR going to nonpitchers, that’s 19 WAR per team. Which means that an average full-time nonpitcher is going to get slightly more than 2 WAR. (An average player with limited playing time will get proportionately fewer WAR.)
As for starting pitchers, we have 330 WAR to give out, or 11 per team. That means each average starter will get slightly more than 2 WAR. (An average SP with limited innings will get proportionately fewer WAR.)
And relievers get 100 WAR or just over 3 per team. Each average reliever gets around 0.5 WAR, all dependent on how many innings they throw.
Adding contributions beyond average
So with everyone’s baseline set at average, we build up, or down, from there based on each player’s contributions relative to average.
If a player generates 10 more runs than average (RAA), that’s worth +1 win, and we add that to their baseline WAR. If their baseline WAR is 2 wins, then being +10 RAA would give them 3 WAR. Similarly, if they generated 10 fewer runs than average and their baseline WAR was 2 wins, they’d come in at 1 WAR.
And how do we establish each player’s runs above and below average? That’s the perfect thing about the WAR framework. The WAR framework doesn’t insist on how to calculate that. You the user can calculate it any way you want. That’s your personal implementation of the WAR framework.
Implementing the framework
It’s alot of work. And since Baseball Reference and Fangraphs have both done all that work, and created a ubiquitous navigation, most fans simply refer to one or both. Clubs almost certainly follow this WAR framework or one that is similar, but with possibly very different implementation parameters.
The WAR framework tells you what you need to build a car. You can build a Honda Sedan, a Lexus SVU, or even a hybrid if you are so bold. The framework is perfect. It’s the details that matter, and its the implementations where you might find construction issues.
I should also note that these assumptions, while they make sense for the current era, may require fine tuning to look at it historically. Every car needs periodic tuneups.
And there are other assumptions to consider, such as fielding position, parks, as well as AL v NL, especially in those seasons where the leagues don’t overlap in play.
***
I should also point out that this WAR framework can be applied to ANY sport anywhere in the world. While each implementation is naturally league-specific, the framework is not dependent on the sport. If it's a sport where you have humans playing a game with winners and losers, this WAR framework will apply. I've done it for hockey and basketball and volleyball. I know I can do it for soccer and football. I don't know anything about cricket or rugby, but I'm pretty sure I can do it for those as well.
(3)
Comments
• 2020/12/03
•
WAR
Monday, November 30, 2020
It’s 21:34. I’m doing this as I type, nothing was pre-written, pre-researched or pre-anything.
***
In 1966, Earl Wilson had a 3.07 ERA, compared to the nominal AL average of 3.44. He allowed (earned) runs at 89% of the AL average. That’s a good number. Gibson had a 2.44 ERA in a league average of 3.61 (or a great 68% of NL average). According to WAR at Baseball Reference, Earl Wilson had the highest WAR among all AL pitchers (5.9), and sixth in MLB just behind Bob Gibson at 6.1.
Something looks terribly, terribly wrong… right?
Let’s work through Bob Gibson to understand how he gets to 6.1 WAR, and fifth place among all pitchers.
1. Runs Allowed
The first step is that we discard ERA in favor of RA/9. Gibson has 14 unearned runs, so his RA/9 is 2.89, compared to the NL average of 4.10. That’s 70% of the NL average. So, he loses a tiny bit of lustre here, but still great.
2. Opponent Adjustment
He faced slightly tougher opponents, a collection of teams that scored a park-adjusted 4.19 runs per game, compared to the league average of 4.11. So he gains a tiny bit of lustre back.
3. Fielding Adjustment
The Cardinals had the best fielding team in the NL, an impact of 0.27 runs per game. So Gibson loses a bit of lustre now (part of his low runs allowed was because of the fielding team). So the opponents, who would normally score 4.19 runs per game, would therefore against a Cardinals defense score 3.92 runs.
4. SP/RP Adjustment
Gibson is a SP and it’s tougher to pitch as a starter, which is an impact of 0.07 runs. So, now his opponents, with Cardinals defense against a SP, would score 3.99 runs per game.
5. Park Adjustment
Finally, Gibson played at overall park-neutral sites, so there’s almost no impact there. It ends up that his opponents, against his fielders, against a SP, at Gibson’s park, would score 3.96 runs per game.
To recap: 2.89 runs allowed per game by Gibson against a context of 3.96 runs (or 73%). Still great, but a smidge less than the initial check using only ERA with zero adjustments.
3.96 runs in 280.1 IP is 123 runs allowed. That’s the “average” baseline. Gibson allowed 90 runs. That difference is 33 runs. That’s his RAA.
(Note: Baseball Reference shows 30. Not sure why, but we’re not going to let 3 runs stop us for this illustration.)
6. Runs to wins
To get it on a more familiar scale, we can convert to wins. We don’t HAVE to. It just makes it easier. These are not “actual” wins, but more like runs-derived wins. The runs-to-win conversion looks to be close to 8.5. So 30 RAA translates to 3.6 WAA (wins above average).
7. Bring in replacement
To account for playing time, we don’t want to compare to “average” but to the bubble player. There is value to being able to eat up innings. There are 13.67 wins assigned to pitchers per team. In a 10 team NL, that’s 136.7 WAR. With 14551 IP, that means we give out 0.0094 WAR per IP (or if you like 0.0845 WAR per 9IP).
Since Gibson has 280.1 IP, he gets 2.6 WAR for his playing time.
8. Now you get WAR
That’s 3.6 + 2.6 = 6.2 WAR. (Baseball Reference shows 6.1. Let’s chalk the 0.1 difference to rounding error.)
As you can see, a good amount of effort to handle all the little things.
***
Now that we’ve figured it out for Bob Gibson, let’s see how Earl Wilson could have possibly gotten so close to Gibson.
1. Runs Allowed
Wilson had only 4 unearned runs. That’s a 3.20 RA/9 compared to the league average of 3.90 (or 82% of AL average). As you can see, that’s a big step forward, 10 fewer unearned runs than Gibson.
2. Opponent Adjustment
He faced slightly tougher opponents, a collection of teams that scored a park-adjusted 3.99 runs per game, compared to the league average of 3.90. So he gains a bit here too.
3. Fielding Adjustment
Both his teams were above average MLB(*) in fielding, and so he benefited by 0.11 runs here. So the opponents, who would normally score 3.99 runs per game, would therefore against his defense score 3.88 runs. (This context is pretty close to Gibson at this point.)
(*) This will be important in the addendum.
4. SP/RP Adjustment
Wilson is a mostly SP, which is an impact of 0.06 runs. So, now his opponents, with his defense against a SP, would score 3.94 runs per game. (This context is still pretty close to Gibson at this point.)
5. Park Adjustment
Finally, Wilson played at heavy hitters park at Fenway with league-average at Tiger Stadium. Overall, the effect is 1.03X. So the 3.94 runs we have goes up to 4.06. (Baseball Reference shows 4.08, so some rounding errors on my side.) It ends up that his opponents, against his fielders, against a SP, at Wilson’s parks, would score 4.06 runs per game.
To recap: 3.20 runs allowed per game by Wilson against a context of 4.06 runs (or 79%). So this is pretty good, and a big step up from the initial check using only ERA with zero adjustments.
4.06 runs in 264 IP is 119 runs allowed. That’s the “average” baseline. Wilson allowed 94 runs. That difference is 25 runs. That’s his RAA. That’s 8 runs worse than Gibson.
(Note: Baseball Reference shows 30. Not sure why, so now I am concerned that I can’t get closer. My calculations shows an 8 run gap against Gibson, 33 to 25, but Reference shows a RAA of 30 to 30. Why? If I spent more time on this, I’m going to assume this is a league adjustment. I can probably figure this out if I took another 30 minutes, but it’s 22:08 as I write this, and I’m pretty spent. So, let’s assume I’m missing the AL/NL League Adjustment step.)
6. Runs to wins
With a similar runs-to-win conversion, Wilson’s RAA translates to 3.7 WAA.
7. Bring in replacement
Since Wilson has 264 IP, he gets 2.5 WAR for his playing time.
8. Now you get WAR
That’s 3.7 + 2.5 = 6.2 WAR. (Baseball Reference shows 5.9. Let’s chalk the 0.2 difference to rounding error.)
So there you have it. A 68% ERA compared to the NL, and a 89% ERA compared to the AL comes out to a similar WAR.
***
Addendum:
(*) This is where the issue comes with the fielding adjustment. For most of MLB history, including 1966, the two leagues were in fact separate. So the “league average” truly meant AL and NL were two distinct leagues. But the fielding average is not set to 0 in each of AL and NL. This (I think) probably leads to a construction issue. Again, not sure, but maybe. It’s 22:14. That’s all I’ve got in me tonight.
(9)
Comments
• 2020/12/03
•
Bill_James
•
WAR
Sunday, February 23, 2020
At around midnight, I had a tweetstorm on NaiveWAR. What follows is simply all my posts collected into one thread. If there’s anything new beyond the tweets, I’ll make a point of highlighting it by using the Pozterisk.
Read More
(2)
Comments
• 2020/02/25
•
WAR
Friday, October 11, 2019
?Some 12 years ago or so, we developed the WAR framework on the old Book Blog. It's nothing too original, working off what Pete Palmer and Bill James and Mitchel Lichtman had done in some form or other. What I did was give the framework some rules and logical underpinnings, with some key assists by the Straight Arrow readers. The framework could then be implemented under those guidelines. Because those guidelines were so clear and not onerous, it was just a matter of time for someone to implement their version of WAR. (You'd think I would have done it, but nope. I was happy just doing it on an ad-hoc basis. I wanted to see its application to the world of free agents.)
Long-time peer Rally Monkey was the first to do so, and he implemented it with a fantastic presentation. That was soon followed by Fangraphs doing their own flavor, also with an excellent presentation. Baseball Reference used the Rally implementation, and added even more transparency. Basically, the most transparency that you can imagine, while giving the results in as digestible form as it does.
And because Baseball Reference is so clear in its presentation, we can talk about Lance Lynn and Ryu. Ryu has allowed one fewer run per 9IP than Lynn, but Lynn has a 2.5 WAR lead. This is an enormous flip. You'd think there must be something wrong. Well, not according to the assumptions that Reference has clearly laid out. The key to understand ANY baseball stat is:
what would an average player do in THOSE conditions?
And the conditions that differentiate Lynn and Ryu are the three conditions that we all think about:
- Fielding Support
- Park Impact
- Opposing Hitting Talent
We don't know how the Dodgers fielders helped Ryu specifically. We have several estimates as to how they helped the whole team. Fangraphs has the data for UZR and DRS. If you relied on the UZR estimate, the Dodgers are an average fielding team. If you relied on the DRS estimate, the Dodgers are the 2nd greatest fielding team since 2002 (behind last year's DBacks and ahead of this year's DBacks). Baseball Reference relies on one fielding system, and that fielding system, since 2002, is DRS. As a result, you start with a league average run environment, and remove 0.5 runs per 9IP to that, and that's our starting point for Ryu's (and basically all Dodgers pitchers) fielding-adjusted run environment. So whether Ryu benefited from the Dodgers fielding support or not, it's presumed he earned it (about as much) as his pitching mates. Similarly, the Rangers are presumed to have terrible fielders. And so, Lynn's adjusted run environment is to add 0.3 runs per 9IP. So right there, we can explain a 0.8 run gap per 9IP.
The Rangers play in a presumed huge hitter's park, and the Dodgers play in a presumed pitcher's park. That gap is going to add another 0.7 or so runs to Lynn v Ryu.
Finally, Lynn faced tougher hitters than Ryu, which is another 0.5 run gap.
Add it all up, and you have a 2 run gap in conditions, per 9 IP. You can see it right here (click image to embiggen)
?
So now, we take Lynn's actual runs allowed per 9 IP (3.84) and compare it to his presumed conditions (6.25 R/9IP), and we end up with Lynn saving 2.41 runs per 9 IP compared to an average pitcher would have done given Lynn's presumed conditions. And over 208.1 IP, that's 56 Runs Above Average. (It shows 54, presumably because of rounding.)
Similarly, Ryu's conditions is that an average pitcher would have allowed 4.22 runs compared to his actual 2.61, or saving 1.61 runs per 9 IP, which over 182.2 IP is 33 Runs Above Average.
There is a runs-per-win conversion where Lynn's 54 runs counts as 5.7 wins, and Ryu's 33 runs counts as 3.6 wins.
The final step is to compare the average pitcher to the replacement level, and add that to each pitcher. So, Lynn's 5.7 WAA becomes 7.6 WAR (so gaining 1.9 wins), and Ryu's 3.6 WAA becomes 5.1 WAR (gaining 1.5 wins). Those are, basically, proportional to their innings pitched (plus I think a league adjustment?).
Anyway, and that's how you go from one guy being 22 runs ahead of the other (Ryu over Lynn) or the equivalent of 2.2 wins, to the tables being turned, and the other guy (Lynn) being 2.5 wins ahead (of Ryu).
That's a 4.7 win turnaround, based strictly on the estimated run environment for each pitcher.
Whether you accept all this or not, that's up to you. But all the data is there for you to make an informed choice.
(22)
Comments
• 2019/10/21
•
WAR
Thursday, January 10, 2019
?Just a placeholder for stuff I did over at Fangraphs a few years back.
Read More
Tuesday, March 01, 2016
?Here's my latest over at Fangraphs, including a poll.
It's running at 45-27 in favor of context-neutral over context-specific, with another 28% having their heads explode.
(7)
Comments
• 2016/03/10
•
WAR
Saturday, February 27, 2016
The question relates to how you see pitchers and the impact of their fielders. We have two pitchers, let’s call them Stephen Strasburg and Adam Wainwright. And they are pitching in the same game.
Strasburg pitches a complete game, striking out 13, without walking anyone, or allowing any extra base hits. But he does allow 10 singles, or at least, he and his fielders allow 10 singles, and that leads to 2 runs.
Wainwright also pitches a complete game, he also doesn’t walk anyone, but he only strikes out 4. He only allows 7 singles, or at least, he and his fielders allow 7 singles, and that leads to only 1 run.
The only thing you know is what I’ve told you. If you wish to infer more, like perhaps the Cardinals fielders helped Adam more than the Nationals fielders helped Stephen, go ahead. If you wish to infer that Wainwright allowed softer hit balls than Strasburg, you can do that if you want. You decide how to interpret the information I’ve given you.
Go over to Fangraphs and vote.
(9)
Comments
• 2016/02/28
•
WAR
Friday, February 26, 2016
Part 6:?Relievers giving up 1 run with either a 2-run or 1-run lead.
Part 7: Solo HR v bases-clearing double
Part 8:K v SF with runner on 3B and 1 out
(4)
Comments
• 2016/02/29
•
WAR
When the home team enters the top of the 9th with a 3-run lead, they will win that game 98%
of the time. That happens mostly because they get to pick and choose the reliever they want. If they chose a random reliever, they'd win 97% of the time. If they chose a poor reliever, they'd win 96% of the time. It's pretty tough to mess up a 3-run lead, especially when the home team gets one more crack at it in the bottom of the 9th.
So, we have a SP that went 8, and he hands off to the reliever this 3-run lead. The ace reliever comes in. Let's call him Armando Benitez. He walks the first batter, allows a HR to the second, then strikes out the side. The game ends, and his team wins. Armando even gets a "save", whatever that is supposed to imply.
Since he was given a 3-run rope, and he only used 2-runs, he was able to turn a 96% or 98% chance of winning into 100%, all without the help of his fielders. Incredibly, things could have gotten worse, which does happen 2 to 4 percent of the time. In this case, he pitched just bad enough to win.
If you want to vote on the poll, go over to Fangraphs.
(7)
Comments
• 2016/02/26
•
WAR
Thursday, February 25, 2016
For now, let's start the second inning by leaving aside the hitters and talk about defense. Now, when I refer to defense, I mean pitching+fielding. Remember, defense is the whole team, the pitchers and the fielders. We'll worry about how to separate fielders from pitchers in a soon-to-be-asked question. Just not now. Cool?
Let's say that one team defense allows 10 hits with 3 walks. But they are all scattered, and so actually end up with a shutout. Another team defense allows the exact same number of hits and walks. They even allow them in the exact same way. The only difference is the timing. They allowed them bunched up, and so resulted in eight runs allowed. From a team defense perspective, how do you see them in terms of assigning value?
You can vote over at Fangraphs.
(1)
Comments
• 2016/02/26
•
WAR
In trying to summarize the responses to the three questions, so far, what we have in terms of preference is:
- the event, regardless of the context
- the event, within the context of the whole game state (inning, score, base, out)
- the event, within the context of the base-out state
- and far down the list, the event as it ultimately affects the inning
What the responders therefore are gravitating toward is a purely
content-neutral metric. But, to the extent that we do want to measure the context-specific impact, that should be kept separate, and perhaps not even tied to the player at all. Just a general "timing" bucket.
If we take the case of the triple in the previous thread, in either case, Hamilton and Dyson will get +1 run, because that's the context-neutral value of the triple, according to Linear Weights.
We immediately add a -0.4 runs because a triple with the bases empty and 0 outs is worth +0.6 runs. So, they don't want to penalize either guy for getting the triple when they did, and so, to make things add up, we need "-0.4" runs for timing.
Then the three outs, they each get -0.25 runs, as is the standard weight.
So far, we have this:
+1.0 Hamilton
-0.4 timing: limited impact triple
-0.25 batter1
-0.25 batter2
-0.25 batter3
That's a total of -0.15 runs. But since the inning started at +0.5 runs of expectancy, and we get 0 runs scored, the total has to be -0.5 runs. So, we add another item:
-0.35 bad timing: leaving runner on base
As for the other scenario:
+1.0 Dyson
-0.4 timing: limited impact triple
-0.25 batter1
-0.25 batter2
-0.25 batter3
But, since we actually scored a run, that should come in at +0.5 runs. We need another:
+0.65 good timing: scoring the runner
For a minority, a vocal minority, those "timing" impact runs should be given to the players involved. Looking at the Hamilton one, whereas a generic out is worth -0.25 runs, an out with a runner on third is more costly. So, that -0.35 runs has to be distributed to the three out-makers, for those readers part of the vocal minority. For the readers in the majority, those runs are an after-thought. Maybe they should be considered, so the thing adds up. But, it shouldn't fall on the shoulders of the players involved. Just a general team bucket to capture the various plays affected by timing.
So, that's how you build your WAR:
For each player, figure his context-neutral impact as one value, and his "timing" as another value.
Then, the reader can choose whether to include the timing value or not.
Now, on to the pitchers and fielders!
?The post and poll is up on Fangraphs. I'm reposting the post below, but the poll needs to be accessed from there.
Read More
(7)
Comments
• 2016/02/25
•
WAR
Wednesday, February 24, 2016
?What I think I'll do is write a blog post here, see the discussion evolve. Then, I will modify the post and the next day, put up the "final" article over at Fangraphs.
Ok, you guys have spoken, and you don't want a bases loaded walk to count the same as a solo HR. That even though the base-out state before the event and after the event remain unchanged, that the number of runs now in the bank are the same, the WAY it happened matters to most of you. Therefore, we are NOT trying to preserve the runs, we are not trying to make sure the runs add up. You have been clear on that.
Now, let's talk about "preservation of wins". It's a 0-0 game, the bottom of the 9th, the bases are loaded with two outs. Historically, at this point in the game, the batting team would end up winning 68% of the time. It's a high stakes situation, a Leverage Index of 6.4. And the batter walks. The batting team wins, game is over. Ooops, I meant the batter hit a single. No, wait, it was a Grand Slam. No, wait it should have been a Grand Slam, but Robin Ventura decided to abandon the bases after he reached first base. Regardless, the game is over, and the batting team won as soon as the batter touched first base.
Your question:
It's the bottom of the 9th of a tie game, bases loaded.
(a) Same impact. I care about the preservation of wins.
(b) Totally different. I want the HR to count for alot more.
(23)
Comments
• 2016/02/25
•
WAR
Sunday, February 21, 2016
?Wins Above Replacement is an estimate of... something. What that something is is different for every person. While the currency is wins, it's not clear what those wins represent. There are reasonable choices you can make along the way. And for every fork in the road you take, you may diverge yourself from the next guy. This is why WAR can never be one thing.
As a framework, WAR leaves little room for discussion. Whether it's what you see at Baseball Reference or at Fangraphs or openWAR or (to some extent) at Baseball Prospectus, they have as their framework the WAR as was championed on our old blog. But a framework is not the same as an implementation. 95% of the cars on the road all follow the same core design. That's the framework. But a Chevy is different from a Lexus. Those are implementations. And there are as many implementations of WAR as there are baseball fans. This thread is an effort to try to come up with a WAR metric that will satisfy the Straight Arrow readers.
***
I'll ask you a series of questions, starting now. The openWAR guys talk about "preservation of runs". That is a good starting point, and a great way to describe it. So, the question centers around whether we want to make sure that everything adds up at the play level. If you get a bases loaded walk, do we want to make sure that exactly 1 run is accounted for or not?
If you care about "talent", you just want to account for around +.30 runs for offense (and -.30 runs for defense), because you don't want to be concerned with the specific base-out state. (We'll talk about "preservation of wins" in a later question.) Similarly, is a bases empty walk and bases empty single the same thing or not? And if you want to preserve runs, are you ready to accept a bases loaded walk and a solo HR as being the exact same thing?
So, have a discussion, then I'll put up a poll in a day.
(34)
Comments
• 2016/02/24
•
WAR
Page 3 of 3 pages
< 1 2 3
Recent comments
Older comments
Page 2 of 151 pages < 1 2 3 4 > Last ›Complete Archive – By Category
Complete Archive – By Date
FORUM TOPICS
Jul 12 15:22 MarcelsApr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref
Apr 12 09:43 What if baseball was like survivor? You are eliminated ...
Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method)
Jul 13 10:20 How to watch great past games without spoilers