Tangotiger Blog

Wednesday, September 25, 2024

Runs Above Average

By Tangotiger

It is very (very very) simple to figure out Runs Above Average (RAA) for a pitcher. I'll use Paul Skenes as the example.

Take the league average ERA (4.086) and subtract our pitcher's ERA (1.992). That makes Skenes 2.094 runs per 9 IP better than league average.

Since Skenes has 131 IP, we take the above number (2.094/9) and multiply by 131 to give us +30.5 runs above average.

That's it. That is Runs Above Average using ERA-only. That figure for Skenes is 4th highest in MLB, behind Sale (+34 runs), Skubal and Wheeler (+33).

Now, you may be asking: what about park factors? Baseball Reference has Skenes as pitching in slightly batter's parks. So, that simple league average of 4.086 is actually too simple, since that figure is the same for all pitchers. We know that can't possibly be true. Skenes also faces tougher competition than average. Skenes supposedly has weaker fielding support than others. When you make all these adjustments, Skenes actually ends up being +41.5 runs above average. Remember, unadjusted he was at +30.5 runs above average. So, the adjustments gives him an extra +11 runs. That's right, his 1.99 ERA is actually NOT giving him enough credit.

Since Baseball Reference is terrific in how they share their data, it's really quite simple to compare the ERA-only RAA to the fully-adjusted RAA they provide.

On this chart (click to embiggen), on the x-axis is the ERA-only RAA. If you don't want anything adjusted and you just want to rely on ERA, then just look at those numbers.

The y-axis is the bonus (or deduction) you have to apply to your pitcher to account for the context that they end up pitching in. Skenes for example is in the right corner, at 30; 11. That means his ERA-only RAA is +30 runs, and he has a +11 run bonus for his context. So, he's worth +41 RAA.

Some pitchers get FAR more bonus than that. Hunter Greene gets +19 runs of bonus for his context. That means his ERA is really clouded, practically Coors-like in its effect. So, he's +20 runs for his ERA-only and another +19 runs for the context, for a total of +39 RAA.

Erick Fedde is +13 for his ERA-only, and another +17 for his context, giving him +30 runs above average.

We can compare Cy Young candidates Cole Ragans (+17, +10) to Logan Gilbert (+18, -10). You see, both are very similar based on their ERA. But according to Reference, Ragans faced a tough context, while Gilbert had a pretty easy context. That's a 20 run gap between the two in terms of their context. So Ragans ends up being +27 while Gilbert is only +8. In other words, instead of Ragans being 1 run behind Gilbert, he's 19 runs ahead, all because of the 20 run difference in their context.

Now, there's no question that if you are a Mariners fan, you will disagree, and a Royals fan is quite happy. That's unfortunately how these contexts gets interpreted: how does it affect MY player.

Chris Flexen is one of the worst pitchers in baseball using ERA, at -18 runs. But Reference says he also had one of the toughest pitching environments to the tune of +17 runs. So overall he ends up being practically league average at -1 runs from average.

Did Chris Bassitt have an ordinary season (-1 RAA)? Or did he have one of the easiest contexts in all of baseball (-15 runs) so that he actually had a disastrous season (-16 RAA leading to -0.1 WAR)?

By ERA, Bassitt is 17 runs better than Flexen. By fully-adjusted Reference method, Flexen is 15 runs better than Bassitt. One had an average season, one had a disastrous season. And which pitcher had which is based on whether to fully trust ERA or to fully accept the adjustments.

Reference lays it all out there for you so you can see what they are doing. You either buy it or you don't. But the transparency is something to be commended.

(7) Comments • 2024/09/28 • WAR

Monday, March 18, 2024

NaiveWAR and WAR2.0: Jacob deGrom

By Tangotiger

As my side-project into NaiveWAR continues, I'd like to also highlight the work of Sean Smith, the progenitor of WAR at Baseball Reference.

I currently have two versions of NaiveWAR. The first based solely on a pitcher's Won-Loss record. And the second based solely on the pitcher's Runs Allowed and IP. Whether in my version, or from Sean Smith, we present it in the form of Individualized Won-Loss Records (aka The Indys). My biggest failing in presenting WAR was not including The Indys. And based on what Sean is doing, he seems to perhaps agree as well.

There's a good reason this is needed because the discussion over the replacement level was actually mostly noise to what is actually WAR. That is my fault, as that conversation got away from me, and I didn't have a way to control that.

Anyway, you can see my two versions on the left (and since this is deGrom, you'll be able to guess which version is which). And Sean's version is on the right. Sean of course is doing alot more than what my Naive approach is doing. And, you can see a tremendous amount of overlap. Which really means that all that tremendous extra work, necessary work, is ALSO noise to the main discussion point of WAR. Make no mistake about it: not only is Sean right for doing what he is doing, but I will also be doing an enhanced version (eventually, whenever I have the time).

But more importantly: the Naive approach is necessary to bring everyone to the wading pool, before we jump into the deep end. WAR has taken on a life of its own, too easy to dismiss because it's too easy not to learn what it is. That's why the Naive approach is necessary. We need folks to get into the wading pool, and then into the shallow end, before we get into the deep end. And what we see with deGrom above is that the difference between the shallow end (Version 2) and the deep end (Sean's version) may not be that big of a dive.

(11) Comments • 2024/03/20 • WAR

Sunday, March 03, 2024

NaiveWAR and VictoryShares

By Tangotiger

In my spare time, I'm working on an open-source WAR, that I call NaiveWAR. Those of you who have been following me know some of the background on NaiveWAR, notably that it is tied (indirectly to start with) to Win/Losses of teams (aka The Individualized Won/Loss Records). My biggest failing in developing the WAR framework was not also providing the mechanism for W/L at the same time. That will be rectified.

The most important part of all this is that it's all based on Retrosheet data, and everyone would be able to recreate what I do. And it would be totally transparent, with plenty of step by step discussion, so everyone can follow along. I was also thinking of potentially using this as a way to teach coders SQL. That's way out in the distance, still have to work things out, but just something I've been thinking about as I'm coding this. I even have the perfect name for this course, which I'll divulge if/when this comes to fruition.

Interestingly, RallyMonkey, who is the progenitor of the WAR you see on Baseball Reference seems to be embarking on a somewhat similar campaign. You can see alot of the overlap, with tying things to W/L records, with the emphasis on Retrosheet. The important part of doing that is we'd be able to do it EACH way, with/without tying it to W/L, so you can see the impact, at the seasonal, and career, level. In some respects, he'll go further than I will with regards to fielding, mostly because I have so little interest in trying to make sense of that historical data, given the level of access Statcast provides me. But also partly because by me not doing it, it opens the doors for the Aspiring Saberists to make their mark, that somewhere between my presentation and Rally's presentation, they'll find that inspiration.

All to say: I dunno what I'm trying to say!

(35) Comments • 2024/10/14 • WAR

Friday, December 08, 2023

Individualized Won-Loss Record of Pedro Martinez

By Tangotiger

I wrote this on Bill James site last year, but since that site may come down, I will reproduce it below.

***

Just a general point regarding WAR v Win Shares, which we can bypass altogether if we just focus on Win Probability Added (WPA), which has the advantage of guaranteeing everything adds up, not only at the game level, but at the individual play level.

And if you look at Pedro's WPA, he comes in at +51 wins above average for his career.

His W/L record is 219-100, or +119, or +59.5 wins above average.

His runs allowed rate is 66% of league average, and Pythag (using 1.82 exponent) says that's close to a .680 record, or +58 wins above average.

So, trying to come to terms with how good Pedro is is pretty straightforward, as we have good agreement using multiple methods. He's +50 to +60 wins above average. This is good enough for my illustration below.

So, if we were to create an "Individualized Won Loss Record" for Pedro, it should be pretty straightforward: let's give out for each pitcher a "game slice" of .42 games for each 9 innings. Pedro's 2827 IP is 314 9-inning games and so he'd get 132 game slices. The average is obviously 66-66.

Since Pedro is about +50 to +60 wins above average, using any method you choose (and using +50 in this illustration), then his Individualized Won-Loss record will come in at 116-16 or so. If you chose .37 games for each 9 innings, then it's 108-8 record. It doesn't matter (too much) what you use, whether .37 or .42 or whatnot.

It will matter (a bit) when you compare to the ".300 level" pitcher, or whatever baseline you choose. A 116-16 record is 76 WAR and 108-8 is 73 WAR.

The key point is that I can make everything add up at the season, game, or play level. And I can do so by using the centering point of .500. And I really, really, really think the entire problem of WAR v Win Shares is we are not talking about it using two dimensions. Because if either of them is appreciably different from this 108-8 or 116-16 record, then we'd have something more tangible to talk about that would actually move the argument forward.

Can you Bill provide the Win Shares / Loss Shares of Pedro's career?

***

Bill: No editorial responses here, because I don’t want this to become a debate exactly, but I can’t produce Pedro’s Win Shares/Loss Shares right now because I haven’t used that spreadsheet in a couple of years and don’t remember what it was called, where it is or how to use it. I’ll look into it, but the next three weeks are the busiest time of the year for me, because this is when we write the annual Bill James Handbook. But I’ll try to remember to get to that.

(5) Comments • 2023/12/09 • WAR

Tuesday, May 23, 2023

Improving WAR - Solo HR or Bases Loaded Walk?

By Tangotiger

I asked the Straight Arrow followers which they prefer to value more for a player in their personal uber-metric. And on a roughly 2:1 split, they prefer the solo HR be valued more. I wasn't surprised by the overall results, but the folks leaving comments had alot of confusion. Let me try to clear some of that up.

Let's talk about the solo HR first. We have bases empty before the batter shows up. The batter hits a HR, and scores a run. The bases are still empty for the next batter. In this case, it's unequivocal: the HR generated exactly 1 run.

Now, with the bases loaded walk, we have, well, the bases loaded to start with. This is what the batter has been GIVEN. The batter didn't earn those three runners. The batter just happens to be there for them. The batter draws a walk. A run scores. And the next batter sees the exact same situation as the batter who got the walk: the bases are loaded. So, in terms of the RUN IMPACT, the bases loaded walk generated exactly 1 run, and this is unequivocal. That there happens to be 3 runners on base (including the batter) after the walk is irrelevant, since there were 3 runners on base before the walk.

So, if you are confused on this point, you need to reread all of this. The run impact of the solo HR and the bases loaded walk are identical: exactly one run was generated.

Now, the question is how to value the PLAYER for each of these two events. A random HR is worth +1.4 runs, because sometimes there's runners on base, and sometimes there are not. On average, it's +1.4 runs. This PARTICULAR HR, the solo HR is worth exactly +1 run. This is how you can see the breakdown:

+1.4 runs for hitting a random HR
-0.4 runs for hitting a HR with the bases empty

So, combined, that's +1 run. The batter, unfortunately, happened to cash his chips early, and instead of waiting (though naturally, it doesn't work like that) for a runner, he decided to play his HR card with the bases empty. One run was the result, not 1.4. Do we credit the batter with +1.4 runs, and -0.4 runs to the rest of the team for not having the foresight to send a runner on base? Imagine if you will, it was the leadoff HR of the game. Then what? Do you still want to give the batter credit for +1.4 runs for a HR, and somehow give out -0.4 runs to... who... the manager for putting Rickey Henderson or Mookie Betts as his leadoff batter? Or, do we just measure the player exactly in context:

+1 run for hitting a HR with the bases empty

Well, you tell me.

Now, let's talk about that walk. The bases are loaded. The batter did nothing to earn that. He gets a walk. The batter totally earns that. A run scores as a DIRECT RESULT of the walk. Who earns that? And after the walk, the bases are still loaded. From a purely causative agent, the batter directly generated exactly 1 run. The bases were loaded before the batter got the walk, the bases are still loaded after the batter got the walk. The DIFFERENCE is simply this: one walk was added, and one run scored. If this was a RANDOM walk, it would add around +0.3 runs, because most of the time, the walk has no runner on first base, and so, it's really just about the batter getting on base. So, you can see it this way if you like:

+0.3 runs for getting a random walk
+0.7 runs getting a walk while the bases are loaded

So, that's +1 run. The batter waited to play his walk-card while the bases are loaded (not that it works like that). And 1 run was the result, not 0.3 runs. Do we credit the batter with +0.3 runs and the rest of the team +0.7 runs for having the foresight to get on base for that batter knowing the batter would walk? Or do we just measure the batter in context:

+1 run for drawing a walk with the bases loaded

Well, you tell me.

***

I will say this: if you measure a batter based on the ball-strike count, then you would surely treat what he does with a 3-0 pitch differently from an 0-2 pitch. The batters and pitchers are responding to the count, and they are changing their approach on the count. This is ridiculously obvious for any baseball fan to realize. Batters will rarely swing on 3-0, while they are very aggressive on 0-2. Now, naturally, it's the batter/pitcher interaction that puts them into those counts. So, looking at it from the plate appearance level, it's irrelevant if we end up with a 3-0 single or a 0-2 single. A single is a single. But if you are trying to measure each pitch in isolation, then each pitch is going to have a value dependent on the count.

When you have a runner on 1B, the pitcher will naturally pitch differently. They know that 3-0 pitch is much different in impact if there's a runner on 1B or not. And similarly, with a runner on 3B and less than 2 outs, both batter and pitcher will treat the entire plate appearance differently. The pitcher really wants a strikeout, while a batter just wants to make contact. And so, their pitch by pitch approach is dependent on the base-out situation. We can't just treat a long fly out the same, whether there's a runner on 3B and less than 2 outs or not. The entire plate appearance, every pitch, was predicated on knowing the base-out situation. The batter and pitcher both interacted exactly because of that. We can't then just look at it from a high level and say: well, that's a long fly out, just like any other long fly out. It's just an out.

That's not how baseball actually works. And when you create a model, you are trying to represent reality. And the reality is that players are humans and that has to be our starting point. And humans respond to stimulus, they change their behaviour. And a long fly out, with a runner on 3B and less than two outs, matters differently from a bases empty long fly out.

You can of course go all the way here: consider the score is tied in the bottom of the 9th, with a runner on 3B. In THAT scenario, a team might even bring in one of their outfielders. And whether the batter hits a single, double, triple, or even a HR: it makes no difference. A hit wins the game. Or imagine the bases are loaded. In this case, a walk and a HR have equal value: the runner on 3B touching home plate is what makes the team win. And this is where I lose some folks. And I lose them to the point that it unravels the entire thing, and suddenly, we are treating a bases loaded walk as if it was a random walk, and a solo HR as if sometimes there was a runner on base.

You tell me.

(1) Comments • 2023/05/23 • Playing_Approach • WAR

Friday, March 24, 2023

Improving WAR - Implicit Regression Toward the Mean

By Tangotiger

Yesterday, I showed you a method to finding the replacement level.

Perhaps the biggest source of confusion when it comes to sports data is the amount of Random Variation contained in the observations. In the above thread, I focused on players who had at least 5 Individualized Games (iGames, or iG), which represents half a season, for the most recent season, and at least 15 iG over the previous three seasons. Why did I do that? Because I needed a substantial amount of data in order for the signal to suppress the noise. And with an average of about 20 iGames (the equivalent of two full seasons), that was enough.

Now, I will show you how NOT to find the replacement level. I will focus on players with at least 5 iG in the previous season, with no checking on how they did in prior seasons. With 1424 players, there are 124 who have an Indis win% of under .200. Their average win% was .119.

Does this represent their TRUE talent? No, not at all. In represents SOME of their True Talent, but also SOME Random Variation. These aren't necessarily below replacement level players. They aren't even necessarily players who "played" at a below replacement level. For all we know, these are .350 players who, through bad luck, ended up recording stats at a level of .119. How can we tell?

We can tell by looking for an unbiased estimator. And the best place to look for that is from seasons that are NOT part of the performance observations you select from. And the easiest place to find that is the NEXT season. And in the next season, these players averaged 3.6 iGames at a win% of .382. And that becomes our estimate as to their true talent level.

So, what does this mean? Is .119 the replacement level? Or is it .382? They both have their problems, but the first one has a much bigger problem than the second. The next season, the one with the .382 win%, that's limited to only players who actually played in the next season. This is the survivorship bias. Players who were hurt the previous year might have gotten better the next season. Or players who truly were bad did not get a chance to show how bad they were because they were dropped.

What therefore can we do with that .119? This is where it get really tricky. The .119 win% is observation that has more bad luck than good luck. But we want to compare those players to the true talent level of .300 win%, which has equal amount of good and bad luck. Is it necessarily fair to have negative WAR for those players at .119?

The median Indis win% in the next season was .275. That is probably our best estimate as to the true talent level of the group. That still leaves us what-to-do with the .119. Can we represent them as very below replacement level? After all, we probably think their true talent level as a group is .275. How can then we therefore look at them as .119.

This is why we don't want to get too stuck on single-season observations. By expanding our sample size, we can a much truer representation of what the replacement level is. It will be close to .300.

But we are stuck with the idea that true players at .275, who happened to put up numbers at a .119 level will get evaluated as .119 level against the .300 level.

The alternative, other than an explicit Regression Toward the Mean, is a floating replacement level. So, the .119 players get compared to say a .150 replacement level. And the .225 players might get compared to say a .240 replacement level. And so on, until you get to .300. This is an implicit Regression Toward the Mean.

And this may be what Bill James may be talking about. He may be actually be proposing a Regression Toward the Mean solution, but instead of it being explicit (meaning adjusting the observations to give us a posterior number to work with), he instead sticks with the observations being unaltered, and floats the comparison baseline. If this is what he is talking about, then my proposal here may be just the way to get both sides on the same page.

() Comments • • WAR

Wednesday, March 01, 2023

Improving WAR - Synchronicity of Scoring Runs

By Tangotiger

On Sept 3, 2022, Dylan Cease faced 29 batters for a shutout, allowing one hit, 2 walks, while striking out 7. The Whitesox scored 13 runs.

Did the Sox win because of Cease? Or the batters? Even if the batters would have had one of their worst outings, the Sox would have won. Similarly, with 13 runs of support, the pitching would have to have been an enormous disaster to lose that game. For the sake of discussion, let's say that both Cease and the batters contributed equally to the win. Let's give Cease 0.5 wins and 0 losses, and we'll do the same for the batters. And because I don't like to carry decimals, I'll just multiply everything by 100:

50 Wins, 0 Losses: Cease (+0.25 wins above average)

50 Wins, 0 Losses: Batters (+0.25 WAA)

Game #2

Now, suppose the batters provided the league average 4 or 5 runs of support, then what? Well, in that case, if the batters are providing league average support, then they are probably contributing 0.25 wins and 0.25 losses. Cease and his sensational game is providing the rest. And since everything has to add up to 1 win and 0 losses, it looks like this:

75 Wins, -25 Losses: Cease (+0.50 WAA)

25 Wins, 25 Losses: Batters (+0.00 WAA)

Now, remember, it's the same Cease performance in either game. He didn't do anything different. But we're assigning a different value to Cease because for this particular example game, winning the game 4-0 or 5-0 puts the spotlight on the pitcher a great deal. In the actual game he had won, the 13-0 game, the spotlight was really shared.

Game #3

Let's go the other way. Let's suppose the batters scored 23 runs instead of 13. Cease still had the great game. Now what? Maybe it looks something like this:

20 Wins, 0 Losses: Cease (+0.10 WAA)

80 Wins, 0 Losses: Batters (+0.40 WAA)

In all these games, it's always the same Cease performance. But with one win available, we have to make some choices as to which players earned their share of the win. Everything has to add up.

Proposal

Now, let me offer an alternative for the actual 13-0 game.

100 Wins, 0 Losses: Cease (+0.50 WAA)

98 Wins, 2 Losses: Batters (+0.49 WAA)

-98 Wins, -2 Losses: Poor Synchronicity (-0.49 WAA)

So, what did I do here? Well, I'm evaluating Cease independent of his batters. A shutout will always get you a win 100% of the time, so that's what he gets here: 100% wins, 0% losses.

The batters scored 13 runs, which we evaluate independent of the 0 runs allowed by Cease. How often do teams that score 13 runs win? That's 98% of the time. We give the batters 98% wins and 2% losses.

Explanation

Giving Cease 100% of a win and giving the batters 98% of a win, that's 198% wins. But we only have 1 actual win. Since we want to freeze Cease at 100% and freeze the batters at 98% (keeping each independent of the other), we have only one choice left: create a Synergy or Synchronicity bucket, assigning minus 98% wins and minus 2% losses. In other words, the sum of the parts is greater than the whole. And we need to do a reconciliation in order to ensure the sum equals the whole. To do that, we create a Synergy bucket, one that reflects the fact that the Cease and the batters are not in Synchronicity for this game.

And this gives us the best evaluation of the contributions of the players independent of their teammates, while also being able to understand the contributions of the players in totality of the actual game.

Over the course of 162 games, we will expect that for most teams that Synergy bucket will total around 0 Wins and 0 Losses, give or take a few wins. This is how we can ensure we can properly evaluate the players, without needing to worry about linking their contributions directly to wins and losses.

This process ONLY works if we know how to evaluate a player's contributions in the form of runs. Which we do.

(6) Comments • 2023/03/06 • WAR

Sunday, February 26, 2023

Improving WAR - Determining extent to link Game by Game Wins to Player Performance

By Tangotiger

Bill James created Win Shares on the idea that we need to ensure that the Player Performances, when added up, matches to Team Wins (at least at the Seasonal Level). Of course, if you do it at the Seasonal Level, you should also do it at the Game Level. Especially if you know all the performances of the players are tracked at the game level. Which, in this day and age, we do know. So we agree: let's break down each game, and assign each win and each loss to the players based on their contributions in those specific games.

Once you take that step however, well... you know, I was going to write about this, but then someone else wrote the argument against doing this better than I could have written it. In other words, the argument against a Win Shares approach, at the game-level. And who made this argument, by reading my mind and kindly attributing it to me, even though I could not have articulated it as well? None other than Bill James:

...Tom's argument that there is no need to justify WAR with actual wins, but I PRESUME that what he is saying is that won-lost outcomes are somewhat random, thus not appropriate to adjust skill measurements to match them. Not just won-lost outcomes that are random, but interim outcomes. You put walks, hits, doubles, homers, stolen bases and errors into a pot, you get a somewhat predictable but somewhat random number of runs scored. Therefore, there is no logical requirement to match the outcome, because the outcome itself is a somewhat unreliable measurement. I ASSUME that is his argument.

I should point out that Bill is not necessarily agreeing with me (he might be, but it's irrelevant if he is). The important part is that Bill articulated better than I could the argument against trying to get things to add up at the game-level.

Now, I will also say that I WILL create this metric. I will make sure everything adds up at the game-level. And once you see those results, you will likely determine that we shouldn't be doing this. That there is so much random variation game to game that to try to assign a whole win to one team and a whole loss to another team will require some unusual choices. But I will do it, because otherwise someone will say "why don't you do it". And the best way to answer that question is to actually do it, so everyone can see that we really shouldn't be doing this.

And my larger point to Bill was that if it doesn't work at the game-level, then it won't work at the seasonal-level. And the only reason it LOOKS like it works at the seasonal level is because 162 games allows us to wash away so much of that random variation. If you had one game where a team wins 20-1 and then loses three other games 2-1, then after 4 games, that's 23 runs scored and 7 runs allowed. With one win and three losses. Four games won't cut it, and maybe forty might. And by 162 games, it'll work out most of the time. Until it doesn't. Like I said, only Random Variation saves you. And if we rely on Random Variation, then why bother?

(6) Comments • 2023/02/27 • WAR

Thursday, February 23, 2023

Improving WAR - Alternative to WPA

By Tangotiger

On Sept 8, 2019, the Astros beat the Mariners 21 to 1. That 20 run differential was based on the Astros batters getting 22 hits and 7 walks, while the Astros pitchers allowed 1 hit (a HR allowed by Cole).

The issue

In a context-neutral setting, we would have expected a 17 run differential, fairly close to the actual 20 run differential. The context-neutral runs-to-win conversion is roughly 10 runs per win, and so, sabermetrically, the Astros were +2.0 wins above average (WAA) using their actual runs scored and +1.7 WAA if we relied on their wOBA by their batters and pitchers.

Every game however has the winner with +0.5 wins above average (and every loser is -0.5 WAA). In order to properly credit the Astros batters and pitchers with their excess runs, we need to devalue their runs in the context of this game. The typical way is to use Win Probability Added (WPA), which has the nice property of guaranteeing the winner gets +0.5 WAA and the loser gets -0.5 WAA. However, WPA depends on real-time information. This is a fine approach if you are a bettor or a fan, living in the moment. But from the standpoint that a run scored in the first inning is as impactful as the run scored in the last inning, then WPA is not the tool for this job.

I will now introduce the tool for this job.

Pluses and Minuses

We can break up all the performance of the batters and pitchers in terms of good things they did to advance toward a win (the pluses) and the bad things (the minuses). For batters, that means the pluses are the hits and walks, while for pitchers, that means it is the outs. On the minus side, it is the flip side: batters are outs, and pitchers are hits and walks allowed.

For the game in question, the Astros batters generated +18.5 positive runs, while the Astros pitchers generated +6.5 positive runs. In terms of negative runs, Astros batters are at -6.5, while the Astros pitchers are -1.5. Adding it up, and the Astros players are +25 positive runs and -8 negative runs. The context-neutral total is +17 runs above average, which we are trying to translate into +0.5 wins.

The approach

So, how do we get there? We are going to ultimately use a 10 runs per win conversion, so we are trying to get an effective +5 runs above average (to convert to the +0.5 WAA). We treat the -8 negative runs as our baseline for this game. In order to get to +0.5 WAA (or +5 runs above average), we need +13 positive wins. Since we have +25 positive runs, 13/25 factor. In other words, we chop in half all the positive things that the Astros batters and pitchers did, since that is the excess. While it is bad luck that Gerrit Cole paired his 1-hitter to his batters getting 21 runs scored, we still only have +0.5 WAA to hand out. And we get there by diminishing all the good things the Astros players did in this case.

On the flip side are the Mariners. They had +8 positive runs and -25 negative runs. All of the bad stuff they did also gets chopped in half. While it was a terrible performance all-around, there is still only one loss in the game, or -0.5 WAA.

So, why is this approach better than WPA? Well, WPA treats each plate appearance as its unit-of-work. Once the plate appearance is over, the transaction is over, and so, we reassess where we are in the game. In this game, the impact of the last plate appearance is far far lower than the impact of the first plate appearance. As I said, that is fine if you live in the moment. But if you think of the entire game as a single unit-of-work, that the game itself is one transaction, then every plate appearance has to be treated independent of the score in real-time. And instead, it has to be evaluated dependent on the final score. And the approach I have laid out is one way to get there.

Close game

On Apr 20, 2010, the Padres beat the Giants 1-0. Indeed in this game, the Padres only had one hit and three walks, while the Giants scattered six hits and two walks. While in a context-neutral setting, the Giants played better. In reality, it is the Padres that won the game. We want to recognize their performance in the context of this game.

The Padres earned +7 positive runs and -10 negative runs. We want to convert their context neutral -3 runs above average to their game-winning +5 runs above average. Treating their +7 positive runs as the baseline, then we need -2 negative runs to get our overall +5 RAA. And so, their -10 negative runs will count as only -2 negative runs within the context of this game. They did alot of bad stuff, but in context, it did not really hurt them. So, we only count 20% of the negative runs in terms of their win impact.

Alternative

Does all of this make sense? I am not sure. I could have just as well used the -10 negative runs as the baseline, and then count the +7 positive runs as +15, and therefore double all their positive things they did. Or some combination of the two, maybe treat it as an expected +11 positive runs and -6 negative runs. And so, we apply a factor of 11/7 to the +7 positive runs and -6/10 to the -10 negative runs.

In any case, we now have a framework for a game-level WPA-type of approach. And we just have to figure out the details.

(19) Comments • 2023/03/07 • WAR

Tuesday, February 21, 2023

Improving WAR - Complete Checklist (so far)

By Tangotiger

FORERUNNER

IMPROVING WAR (Updated thru Dec 28, 2023)

Pitchers

Defense

Batters

The Bridge to RBI and Runs Scored

Baselines

Wins Framework

Context

ADDITIONAL REFERENCES

How much control do pitchers have on balls in play?

(1) Comments • 2023/12/21 • WAR

Monday, February 20, 2023

Improving WAR - Individualized Won-Loss Records

By Tangotiger

There are two ways to create a won-loss record. One is a rather convoluted way, and another is the simple way. The convoluted way is to start everyone with a 0-0 record, and add in wins and add in losses based on their performance. You will need to jump through hoops to make this to work, and even then, the end result will be open to many issues. I'm not going to do that.

The simple way is to have a two-step process. The first step is to assign everyone a .500 record, based on their playing time (or more specifically, their presence). For pitchers, it would be some combination of innings pitched, batters faced, and leverage index. We can worry about the machinations later on, so to move the discussion forward, let's just use IP. For nonpitchers, it's some combination of games played, plate appearances, innings played. Again, to move the discussion forward, let's just use PA.

The average baseline

We would like the sum-of-parts to equal-the-whole. That means we need to assign 162 games to our individual players. Again, to move the discussion forward, let's give about 43% of the games to pitchers and 57% of the games to nonpitchers. We can worry about the true split later. So for each game played, we have 0.43 game slices to pitchers and 0.57 game slices to batters. With an average of 27 outs per game, this means that each out by the pitcher will earn that pitcher 0.016 game slices. And with about 38 PA each game, that means each PA will earn the batter 0.015 game slices.

Over the course of a 667 PA season, this means the batter will earn 10 game slices. And similarly for a pitcher with 208.1 IP (625 outs), that pitcher will also earn 10 game slices. I will call these Individualized Games, or iGames, or iG.

What would an average player have as an Individualized Won-Loss record, given they have 10 iGames? Right, a 5-5 record. That means 5 iWins and 5 iLosses, or a 5-5 Won-Loss record.

Above and below average players

But what about above average players? Players who are (estimated to be) +10 runs above average (RAA) are also (estimated to be) +1 wins above average (WAA). Now, what are all these estimated qualifiers? Because players work alongside other players, they don't really generate runs on their own. With batters, we can determine the number of bases they get for themselves, and partly the number of bases they add to their baserunners. And we can determine the outs they make for themselves and partly the outs they cause to runners on base. So, while we have a good handle as to their base-out contribution, to directly link them to runs requires some amount of estimation. And then to translate those runs to wins requires another level of estimation. Any Runs Created type metric you see is really an estimate, and not a direct run. So, we do all of these estimations as our best effort to tie those bases and outs into wins.

So, with +10 RAA being the equivalent to +1 WAA, then a player with 10 iGames would have a 6-4 Individualized Won-Loss record. And similarly, a player who is minus 10 runs relative to average would have a 4-6 record.

Every player will therefore be assigned a W-L record, based both on their presence on the field, as well as the impact of their resulting performance.

Value of Players

Where this will be interesting is comparing players with different amount of presence on the field. Suppose we accept the on-the-field impact of a full-time player with a 6-4 record. And suppose we accept the impact of an oft-injured great-player with a 5-1 record. Which player has more value? This is where you come in. You get to choose.

We both agree that the first player is +1 WAA playing 10 iGames and so has an impact of 6-4 W-L record. And we both agree that the second player is +2 WAA playing 6 iGames, with an impact of a 5-1 record. There's little ambiguity or uncertainty in these claims. But in terms of which has more value, that has alot more uncertainty.

In a WAR discussion where we set the zero-baseline at .300 win percentage, the 6-4 record is compared to the zero-baseline of 3-7. And so, that player is worth 3 wins. That's their value above zero. And the player with the 5-1 record is compared to the zero-baseline of 1.8-4.2. And so that player is worth 5 minus 1.8, or 3.2 wins of value. Therefore, if you, the specific user of the Individualized Won-Loss record insist on a zero baseline of .300, you would therefore conclude that the 5-1 player has more value than the 6-4 player.

Of course, you specifically can decide the zero-baseline, the no-value point, is a .200 win percentage. You believe that a player has value if he can perform at above a .200 level. That's fine, that's your choice, your belief. In this case, 6-4 is compared to zero-baseline of 2-8, and so has 4 wins of value. And 5-1 is compared to 1.2-4.8, and so has 3.8 wins of value. If you are a .200 proponent of zero, then 6-4 has more value than 5-1. I won't tell you you are wrong.

I'm a proponent of .300, and so that's my zero-baseline.

(3) Comments • 2023/02/21 • WAR

Improving WAR - The Bridge to RBI and Runs Scored

By Tangotiger

Run Expectancy by the 24 base-out states is the basis for the entire sabermetric revolution. It may not be front-and-center in most of what you see, but behind the scenes, it's what drives virtually every metric you see. The idea behind RE24 is very intuitive: you start with the run potential based on the number of runners you have on base and the number of outs. After the batter's time at bat is over, there is a new run potential based on the number of runners and outs. The difference in the before/after of this run potential we attribute to the event that caused the change.

At its most simplest, imagine what the leadoff batter is faced with. Bases are empty, and there are 0 outs. If we assume there's an average of 4.5 runs that are scored per game, that means there are 0.5 runs expected in that half-inning. This is the before-state. The batter hits a homer. The runner-out after-state is obviously the same, but we have a run in the bank. So the after-state is the one run in the bank, plus the 0.5 for the after-runs, minus the 0.5 for the before-runs. The run value of the HR is therefore 1.0. It is no surprise therefore that we have 1 RBI.

Now, suppose instead we had a runner on 1B and 0 outs. That runner has a nearly 40% chance of scoring, and so is worth 0.40 runs. The run expectancy for the inning is that runner on base (0.40 runs) plus all the future runs expected (which we know is 0.50 runs), or a before-state run-value of 0.90 runs. We now have a HR: that puts 2 runs in the bank, and the after-state run-value is 0.50 runs. So, we have 2 plus 0.5 minus 0.9, which equals 1.6 runs. The run value of the HR, with a runner on 1B and 0 outs is therefore 1.6 runs. The RBI however will give that batter 2 runs.

The RBI depends on the runners on base, and therefore, won't partition the value of that run that scored. We have a double-counting scenario: the runner gets his run scored, and the batter gets his RBI. In this particular case, the runner that scored should get 0.4 runs and the batter that hit the HR should get 0.6 runs on that runner, and 1 run for themselves. So of the 2 runs, 1.6 is for the HR.

Since the RBI will always get credit for every run, we will have an excess scenario in every case. In the particular case of runner on 1B and 0 outs, the excess runs is 0.4. So as to not conflict with the definition of RBI, we'll create a parallel metric called RBA, Runs Batted Above what the runner himself had themselves already earned.

Let's look at the situation with the most excess runs, where the difference in RBI and RBA is greatest: bases loaded, 0 outs. The runner on 3B already has an 85% chance of scoring. That's his run value: 0.85 runs. The runner on 2B has a 60% chance of scoring, and so his run value is 0.60. And the runner on 1B we know has a 40% chance of scoring, which is why his run value is 0.40. The total run expectancy for the bases loaded 0 outs is therefore 0.85 + 0.60 + 0.40 plus 0.50 for all future runs that inning, for a total of 2.35 runs expectancy. So the batter, stepping to the plate with the bases loaded and 0 outs is already in a situation where we expect 2.35 runs to score from that point to the end of the inning. He hits a grand slam. So, 4 runs score, and we have 4 RBI. But, the batter naturally shouldn't get credit for all those 4 runs. The batter did nothing to put those runners on. Some of those runners were going to score.

So, what value did the batter add? Well, the runner on 3B, who had an 85% chance of scoring, the batter added 0.15 Runs Batted Above what the runner already earned. The runner on 2B had a 60% chance of scoring, so the batter gets 0.40 RBA. And the batter gets 0.60 RBA for the runner on 1B. And naturally, the batter gets a full 1 RBA for driving themselves in. Add it up, and we get: 1 + 0.6 + 0.4 + 0.15 = 2.15 RBA. This is what the batter earned. The 4 RBI greatly exaggerated his part in those 4 runs. And so we have an excess of 1.85 runs between how the RBI views the batter's contribution and how RBA establishes it.

Below you will find the true value of the HR by the 24 base-out states, along with the excess runs when you rely on RBI. On average, the run value of the HR is 1.4. Since there's an average of 0.6 runners on base, the average number of RBI per HR is 1.6 and therefore we have an excess of 0.2 runs when you rely on RBI for each HR hit.

() Comments • • WAR

Monday, January 23, 2023

Improving WAR: consider the context

By Tangotiger

As some of you may know, the run value of the strikeout is very very similar to the run value of a non-strikeout out. That's on average. However, when it comes to the run value of the strikeout with one out, and a runner on 3B, well, that run value is very very very different from a non-K out. A flyout could score a runner from 3B, while a strikeout will almost never do so (only a WP/PB will help). Table 50 in The Book shows this quite clearly, with a gap of over .30 runs between a strikeout and non-K with 1 out and about .20 runs with 0 outs.

The batter knows the directional value, if not the magnitude. So does the pitcher. And the manager. And fans. Everyone knows this. What happens when the participants know? Well, they are human, so they respond to stimulus. They change their approach. Of course, BOTH participants are aware. Whereas the batter will take an approach to reduce his chance of strikeouts (at the cost of other production), the pitcher will increase his chance of strikeouts (at the cost of other production). Overall, this tends to cancel out, so that at the league level, things look like there was no change.

At the individual player level however, things do in fact change. It would have to.

One way for us to see it is to split up a batter's performance based on whether there is a runner on 3B, less than two outs, and 1B open (the target context), and everything else (the rest). The target context only accounts for 1% of a batter's plate appearances. But still, even in that small sample, there is signal.

So how do we find it?

I will introduce a somewhat novel approach, and this is going to have some wide-ranging impact on all future similar studies. We are going to run a regression of the split data (target context and rest) against next season strikeout rate in the targeted context. The question we are going to ask is how much weight do we give the target context in the current season in order to predict next season SO rate in the same targeted context.

This chart suggests we want to overweight our target context data at double the rest of the data. Once you do that, you maximize correlation. Now, we couldn't get much more correlation because we are only dealing with 1% of the data in the context. There's only so much you can do with this.

What this chart ALSO shows is you do not want to overweight too much. If you try to give the target data 3X the weight, this is worse than not overweighting it at all. This exactly gets to the heart of splits-data. While it's nice to show the splits data, there's only so much you should rely on it. In this particular study, that overweighting is 2X, meaning you are taking 1% of the splits data and counting it as 2%. While this is substantial overweighting, you still have to consider the remaining 99% of the data.

(3) Comments • 2023/01/24 • WAR

Saturday, January 14, 2023

Improving WAR: Timing bucket: Did Craig Kimbrel have an average season or disastrous season in 2022?

By Tangotiger

Craig Kimbrel in 2022 is a good example of how strong you need to decide to tie performance to timing.

Kimbrel pitched in 63 games. In 43 of those, he had a net positive impact, where he saved 21 more runs than the league average pitcher. In the other 20 games, he had a net negative impact, where he allowed 21 more runs than the league average pitcher. As you can see, overall, he's an average reliever.

However! However, in the 10 games where the score was the closest, the crucialness highest, he pitched 8.1 IP, with 14 runs allowed (9 earned). That's worth 8 runs more than the league average. And when you allow runs in really close situations, you can easily turn a possible win into a sure loss. In the 24 games where the crucialness of the game was the lowest, he pitches 23.2 IP, with 2 runs, both earned. That means he saved 10 more runs than the league average. But saving runs in blowouts isn't really worth much in real time.

In other words, a run saved is not a run earned. It really depends on WHEN you save the run.

So, how can we do this? This is what Win Probability Added (WPA) is about. In the 10 games that were the most crucial, those 8 extra runs he allowed above what an average pitcher would have done, those are worth minus 2.8 wins. In other words, 8 extra runs is worth minus 3 wins. It's an almost 3:1 ratio of runs to wins.

And in the 24 games where there was little crucialness, saving 10 extra runs only gave the Dodgers +0.4 more wins. That's a 25:1 ratio of runs to wins.

Doing this for all his games, and his performance based on runs, which is league average, translated to -1.4 wins to the Dodgers.

Craig Kimbrel "mistimed" his performance to his context. This is REAL to the Dodgers, as it resulted in fewer wins. And we can trace it directly to him having a bunch of bad games when the game was on the line, and a bunch of good games in "tune up" games.

Do we necessarily want to assign the mistiming to Kimbrel? Maybe we do. Or maybe we don't. My preference is we create a timing bucket, and we show it like this:

0 runs above average: Kimbrel Performance
0 wins above average: Kimbrel Performance translated to wins, assuming Random timing
minus 1.4 wins relative to average: Kimbrel Performance based on Actual timing

And doing it this way, we are being true to exactly what happened. And we are allowing the user to choose, for themselves, whether they really care about timing or not. We are explaining everything that's happened.

If we wanted to split it based on performance with runners on base or not, and home or away and against LHH and RHH, the above approach allows for all of that. I simply don't believe in this wrapping everything up into one big number. It's being given the answer to a test.

This has been my position for the last twenty years, and I believe it's the bridge that will bring everyone together.

(3) Comments • 2023/01/15 • WAR

Friday, January 13, 2023

Improving WAR: Festivus Airing of the Grievances

By Tangotiger

You can see the list of grievances here, along with my responses:

Replacement level *is* real, it's where decisions are made as to whether a player is in MLB or not
The change in run expectancy between 1B and 2B is fairly static
CS formula is weird... so?!?
Pos adjustments should not be static, and it's alot of work to do it. Valid grievance beyond the Frank Costanza level
DH is an annoyance. We make it add up at the end. It still works out
It should use wOBAfip. But FIP still works
Yes, but the ubiquity of FIP is that the constants do NOT change. Feature, not bug. In any case, see point 6 above
Of course it's different, it has to be. Pitchers perform much better as a RP than that same pitcher as a SP

() Comments • • WAR

Thursday, January 12, 2023

Improving WAR: Leveraged Innings

By Tangotiger

In 2022, Trevor Williams pitched 89.2 IP, with a 3.21 ERA and 3.88 FIP. That seems like a fine season. Yet over 70% of the time, he pitched in low leverage situations. So, to the extent a 3.21 ERA can help, he didn't really come into games where he could make a difference.

It gets worse though. In the few occasions where he DID come into medium or high leverage situations, he was noticeably worse than average. In other words, he piled up his good performances when it counted the least.

In his 15 games with a Leverage Index (LI) of 0.48 or higher (average of 0.86), he threw 45.2 IP with 22 ER, an ERA of 4.34. In the remaining 15 games with an LI of 0.21 or lower (average of 0.12), he threw 44 IP with 10 ER, an ERA of 2.05.

What does an LI of 0.12 mean? It means that those situations have 12% the impact of a random situation. So, while he may have thrown 44 IP, the IMPACT of those innings is equivalent to 12% of that or 6 IP. This means that instead of counting his 2.05 ERA in 44 IP, we should count them as if they occurred on only 6 IP. We can call these Leveraged IP. Similarly, his 45.2 IP that happened in LI games that averaged 0.86 LI is really 39 Leveraged IP at 4.34 ERA.

Therefore, instead of half of his innings having a 4.34 ERA and the other half at 2.05 (for his seasonal average of 3.21), we instead have 39 Leveraged IP at 4.34 ERA and 6 Leveraged IP at 2.05 ERA. As you can see, the average ERA, his Leveraged ERA comes in at 3.89. That's the actual impact his performance had in the games he pitched. So, no longer would we look at his shiny 3.21 Actual ERA, but instead the more reasonable 3.89 Leveraged ERA.

We do this if we care about evaluating performances based in the games they occurred. If you think everything should be evaluated in random situation, then you can stick with his 3.21 ERA.

(2) Comments • 2023/03/03 • WAR

Tuesday, January 10, 2023

Improving WAR: do we actually care about winning the game?

By Tangotiger

Here's a specific question to consider. There was a game a few years ago where Mookie Betts hit a grand slam, making the game 15-2 (or some such). So, here are the questions to consider:

1. Do you want to distinguish between a solo HR and grand slam? And if so, do you want to totally make the distinction, or take some 50/50 position on it?

2. Do you want to distinguish between hitting a grand slam in a blowout compared to one that wins the game in the bottom of the 9th? And if so, how much distinction will you make?

Whoever wants to create their own uber-metric has decisions to make. Are they trying to simply evaluate the player absent the context? Do they want to make sure all the runs are accounted for? Do they want to make sure that the win in that game is properly distributed to the players who helped win that game?

Everyone will have their own view here, which is why they are multiple plausible solutions.

(10) Comments • 2023/01/12 • WAR

Sunday, January 08, 2023

Improving WAR: reconstructing FIP

By Tangotiger

FIP is has perfect imperfections. Its imperfections are actually attributes to maintain.

We should use plate appearances (PA), but since IP is ubiquitous, we go with IP.
The coefficients should not be integers, but 13, 3, -2 is also easier to remember than whatever it actually should be.
The coefficients should change by run environment, but if you did that, no one would remember the coefficients.
Run scoring at the team or pitcher level does not follow the linear approach suggested by FIP, but who wants to see exponents in an equation?

Having said all that: there's nothing wrong in showing the Classic FIP, while also having an enhanced version for use in more specific player evaluations. That enhanced version uses TTO-based wOBA, or wOBAtto, where TTO is three true outcomes, meaning HR, SO, and Walks (and walks is really unintentional walks and hit by pitches). And we can convert wOBAtto into wOBAfip.

So, what did I just say? We start with the Standard wOBA:

0.7 walks
0.9 singles
1.25 doubles
1.6 triples
2.0 homers

So, we give 2 points for each homer, and .7 points for each unintentional walk or hit batter, and so on. We add them up and divide by PA (plate appearances, while excluding IBB) to get wOBA. You will note that we don't consider outs or strikeouts explicitly, since the coefficient for those is 0: zero times anything is zero.

But, let's make it explicit:

0.0 strikeouts
0.0 batted ball outs
0.7 walks
0.9 singles
1.25 doubles
1.6 triples
2.0 homers

Now, let me rearrange the above slightly:

0.0 strikeouts
0.0 batted ball outs
0.9 singles
1.25 doubles
1.6 triples
0.7 walks
2.0 homers

You will notice that batted ball outs, singles, doubles, triples are now clumped together: those are all balls-in-park (BIP) events, events that involve fielders. Since FIP (fielding independent pitching) makes a point of separating performance that involve fielders and those that don't, we are not going to give the pitcher any credit for any balls-in-park outcomes. Whether they give up 20 singles or 0, we don't care. That's because it's not the pitcher that gives up hits and outs, but rather the Synchronicity between pitchers and fielders results in those outcomes. Whatever influence the pitcher has on batted ball outcomes we will handle in a later step, in a different metric. We want FIP to be as pure as we can make it. Just as OBP makes no distinction between hit batters and home runs (they all count the same), so too will FIP remain agnostic as to balls-in-park outcomes.

The average wOBA on balls-in-park is 0.300. So, we can create a wOBAfip equation to that effect:

0.0 strikeouts
0.3 BIP
0.7 walks
2.0 homers

Let's take Jacob deGrom in 2021. He had 6 homers allowed, so that's 12 points. He had 12 unintentional walks and hit batters, so that's 8.4 points. His 146 strikeouts is of course 0 points. The remaining 160 BIP have a value of 48 points. We add all that up, 48 + 12 + 8.4 to give us 68.4 points, which we divide by his 324 PA to give us a wOBAfip of .211. This figure led MLB in 2021, out of all pitchers with at least TWO IP.

How can we convert wOBAfip onto the ERA scale? Well, wOBA squared roughly approximates run scoring. Fans of the original Runs Created are well aware that OBP times SLG has a roughly linear relationship to Runs Scored. Since wOBA itself has a roughly linear relationship to each of OBP and SLG, then wOBA squared will have a linear relationship to Runs Scored.

In 2021, the wOBAfip for the league was .324. Since deGrom is .211, that's 65% of the league average. If you square that you get 42%, which means that deGrom's Runs Allowed is 42% of the league average of 4.27, or 1.81. That is wOBAfip of deGrom. You can compare that to his actual FIP of 1.28. That's quite the discrepancy. Then again, his ERA in 2021 was 1.08. So, in our efforts to correct all the imperfections of Classic FIP by using wOBAfip, we ended up with a worse result. Why? Well, principally it's due to deGrom having a BABIP that was far far below the league average. Had I used .23 instead of .3 for his BIP in the wOBAfip equation, I would have been very close to his Classic FIP.

In other words, all the wrongs in Classic FIP conspired to give us a right answer. And this is not just a deGrom issue. The correlation is actually a bit stronger with Classic FIP than with wOBAfip when run against that season's ERA. As much as the simplicity of Classic FIP should not work, a reconstruction of FIP that corrects all the imperfections and adds clarity along with complexity ends up not helping us. Occam's Razor is the reason Classic FIP will survive.

The one good news is that the extreme FIP results that broke Classic FIP is handled by wOBAfip.

(5) Comments • 2023/01/10 • WAR

Friday, January 06, 2023

Improving WAR: Considering performance with runners on base

By Tangotiger

I responded to Bill's issue with Dave Parker having 0.2 WAR on Baseball Reference, even though he had 116 RBIs. My response to him doubles as to how I would handle it in my implementation of WAR.

Bill said: WAR credits [Dave Parker] with 0.2 WAR.

My response: The Reference implementation (rWAR) gives him 0.2 while the Fangraphs (fWAR) gives him 0.7, not that that changes the main point.

However, I can easily create a WAR implementation that uses performance by the 24 base-out states to give Parker close to 3.0 WAR. This is because his performance with runners on compared to bases empty was outlandishly different: .225/.272/.357 versus .329/.393/.617 for his BA/OBP/SLG.

Since each of Reference and Fangraphs have taken the position to NOT consider the runner/out situation, then naturally your point about RBI is not going to be addressed by their implementation. This is *not* a WAR issue. It's an implementation issue.

Nothing in the WAR framework insists on which measure of batting to use. And it's very (very very) easy to swap out one for the other. Which I just did.

(1) Comments • 2023/01/07 • WAR

Improving WAR: separating batted ball performance between pitchers and fielders

By Tangotiger

In 2022, the Houston Astros made 2629 outs on the 3652 batted balls in park that were tracked (which represents 99% of all such batted balls). Compared to the league average team, that is 81 more outs.

If we focus on the fielders Outs Above Average (OAA), they were +32, which is among the league leaders. That is, their fielders made 32 more outs than average fielders would have made, given the starting location of the Astros fielders. This last condition is important. We are only giving credit to the individual Astros fielders for their performance AFTER the pitch is thrown, NOT before. We'll get to this condition in a minute.

If we focus on xBA (expected batting average), which is a measure based on the launch angle and speed a pitcher allows, they were also among the league leaders, allowing 28 fewer hits than the league average pitcher allowed (or getting 28 more outs).

Let's recap: we know the total for the Astros is +81. We know the individual Astros fielders add up to +32. We know the individual Astros pitchers add up to +28. So the individual Astros players is +60. What happened to the other +21? Where did the +21 more outs made go, in order for us to account the +81?

This is where the fun happens. If you combine the actual fielding alignment of the Astros, and the actual spray distribution of the batted balls, those two were in synch at a better than league average clip. And that number is +21 more outs. Who deserves credit here? Is it the fielders, PRIOR to the pitch being thrown, for being in the right place at the right time more often than not? Is it the pitchers for allowing the batted balls to be hit closer to the fielders than they'd otherwise would be, since the pitchers are aware of the location of the fielders? Maybe we don't assign it to the individual players, and instead credit the team? Or maybe it's just plain luck that the balls just happened to land at or near where the fielders were standing.

My preference is that it does not go to the individual players. While you can make the argument it was good luck that the pitchers and the fielders were aligned, I can accept the argument that the team as a whole deserves the credit. And so, it looks like this:

+32: Fielders
+28: Pitchers
+21: Team

Now, let's focus on individual pitchers. When Framber Valdez was on the mound, the Astros made outs on nearly 70% of all the batted bals, which is exactly the league average. So, the Astros were +0 OAA with Valdez on the mound. The Astros fielders however were +10 OAA: they were actually sensational when Valdez was pitching. Valdez was a bit better than average with his xBA, at +2 OAA. So, we end up with this chart for Valdez:

+10: Fielders (with Valdez pitching)
+2: Pitcher (Valdez)
-11: Team (with Valdez pitching)

It only doesn't add up to 0 due to rounding. We can see therefore that the Astros were poorly positioned when Valdez was the pitcher. Or, they were well-positioned, and Valdez simply didn't accommodate them by not inducing the right spray angle. Or, as I've mentioned, just plain bad luck for not being in synch.

Here's how it looks for Verlander, who was among the league leaders at +21 more outs than the league average pitcher received:

-1: Fielders (with Verlander pitching)
+8: Pitcher (Verlander)
+13: Team (with Verlander pitching)

As you can see, he's kind of the flip-side of Valdez, even though they were in front of identical fielders. Just as you can have two pitchers receive very different run support from the offense, or very different bullpen support, so too can pitchers receive very different fielding support. In this case, the fielders were +10 with Valdez and -1 with Verlander. And the team (or luck or synchronicity) was -11 with Valdez and +13 with Verlander. Maybe the Astros knew how to position themselves with Verlander and not Valdez. Or maybe those pitchers had a different ability to induce balls to be hit closer to the fielders. Or maybe it's just Synchronicity. In any case, we won't be giving any credit or debit to the individual pitchers or fielders here. We'll just set all those results aside and focus on the things we can directly tie to the individual players.

How about Cristian Javier? He too had an even more sensational BABIP (batting average on balls in play), and was identical to Verlander at +21 OAA. But, he deserved it alot more than Verlander. Here's his breakdown:

+3: Fielders (with Javier pitching)
+15: Pitcher (Javier)
+3: Team (with Javier pitching)

While he got some fielding support, it was really Javier inducing weak contact that led to getting such a low BABIP.

We are therefore in a position to credit every pitcher for their direct contribution to batted balls in park. We can separate the individual fielding performance and the Synchronicity of the fielding alignment and spray direction of batted balls from the weak or hard contact that the pitcher induces.

The Astros pitchers are +28 OAA as their direct contribution, and we can make all the Astros pitchers add up to that +28 from the +15 of Javier down to the -4 of Maton, and everyone in between. And of course do so for every pitcher on every team. We can separate the pitching performance from the fielding performance and other externalities to which the pitcher does not have a direct contribution.

(2) Comments • 2023/01/07 • WAR

Nov 23 14:15		Layered wOBAcon
Nov 22 22:15		Cy Young Predictor 2024
Oct 28 17:25		Layered Hit Probability breakdown
Oct 15 13:42		Binomial fun: Best-of-3-all-home is equivalent to traditional Best-of-X where X is
Oct 14 14:31		NaiveWAR and VictoryShares
Oct 02 21:23		Component Run Values: TTO and BIP
Oct 02 11:06		FRV v DRS
Sep 28 22:34		Runs Above Average
Sep 16 16:46		Skenes v Webb: Illustrating Replacement Level in WAR
Sep 16 16:43		Sacrifice Steal Attempt
Sep 09 14:47		Can Wheeler win the Cy Young in 2024?
Sep 08 13:39		Small choices, big implications, in WAR
Sep 07 09:00		Why does Baseball Reference love Erick Fedde?
Sep 03 19:42		Re-Leveraging Aaron Judge
Aug 24 14:10		Science of baseball in 1957
Aug 20 12:31		How to evaluate HR-saving plays, part 3 of 4: Speed
Aug 17 19:39		Leadoff Walk v Single?
Aug 12 10:22		Walking Aaron Judge with bases empty?
Jul 15 10:56		King Willie is dead. Long Live King Reid.
Jun 14 10:40		Bias in the x-stats? Yes!
Jun 13 17:05		Bat Swing Checklist
Jun 07 12:10		Spray Angle is not needed, part 32
Jun 02 17:37		Stanton Swing Speed and Acceleration Curves
Jun 01 14:44		Statcast Lab: Pre-introducting Bat Acceleration
Jun 01 12:14		Bill James and Tango talk WAR
Older comments Page 1 of 150 pages 1 2 3 > Last ›
Complete Archive – By Category Complete Archive – By Date 2024 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov 2023 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2022 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2021 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2020 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2019 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2017 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2016 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2015 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2014 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2013 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec FORUM TOPICS Jul 12 15:22 Marcels Apr 16 14:31 Pitch Count Estimators Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS Jan 29 09:41 NFL Overtime Idea Jan 22 14:48 Weighting Years for NFL Player Projections Jan 21 09:18 positional runs in pythagenpat Oct 20 15:57 DRS: FG vs. BB-Ref Apr 12 09:43 What if baseball was like survivor? You are eliminated ... Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method) Jul 13 10:20 How to watch great past games without spoilers

Tangotiger Blog

WAR

Wednesday, September 25, 2024

Monday, March 18, 2024

Sunday, March 03, 2024

Friday, December 08, 2023

Tuesday, May 23, 2023

Friday, March 24, 2023

Wednesday, March 01, 2023

Sunday, February 26, 2023

Thursday, February 23, 2023

Tuesday, February 21, 2023

Monday, February 20, 2023

Monday, January 23, 2023

Saturday, January 14, 2023

Friday, January 13, 2023

Thursday, January 12, 2023

Tuesday, January 10, 2023

Sunday, January 08, 2023

Friday, January 06, 2023

Recent comments

Older comments

Complete Archive – By Category

Complete Archive – By Date

FORUM TOPICS

Latest...