Saturday, March 23, 2019
WOWY Framing, part 1 of N
Rewind
About 15 years ago, I introduced a concept I subsequently called WOWY (With Or Without You). The idea has its roots to the way the NHL originally introduced plus/minus. Back then, they compared a player's plus as a percentage of all goals scored by his team in his games, and similarly for the minus. My slight adjustment to what the NHL used to do was to compare a player's plus/minus to that of his team without him on the ice. So, team performance with the player, compared to team performance without the player. The difference, after accounting for Random Variation (and other systematic biases), we'd attribute to the player himself.
You can see what I did with pitchers, with and without their catchers. I did it for the baserunning stats (SB, CS, WP, PB, BK, PK). Strangely, I noted this as an afterthought, and never followed up:
I'm not including blocking the plate or framing the pitches, though that last part might be doable (though I'd have to look at the pitcher's age as well; I'm guessing that the above numbers aren't too dependent on the pitcher's age, which may or may not be a good guess.)
I clearly should have taken the next step and simply tried it with walks and strikeouts. And given the results we have all seen on catcher framing at the pitch level, it's likely we WILL find something notable here, using only walks and strikeouts. Enough that we'll be able to do framing across the Retrosheet years. That's the hope anyway. I'll get to it this year.
Now
For now, I'll turn my attention to WOWY Framing, using pitch locations. I'm going to show how simple and straightforward the process is. And then I'll make it A BIT more complex. And maybe in the future, we'll continue to make it a bit more complex that that. There is an R package that helps this process along greatly, and once I can code an R program without an error, I'll finally do that. Until then, we'll SQL our way through this.
The first step, before we even identify the major variables, is to create a baseline. There's no point to dive in and start figuring out all the variables, if you don't know what your starting point is. And at its most basic, framing is simply about getting called strikes at pitches at the edge of the strike zone. Yes, there is more to it than that. That's how you scare away researchers. "Yeah, but..." Let's not scare them away yet. There's plenty of time for that. For now, let's just bring everyone on board.
A few months back, I showed how often each catcher caught a strike, in what we call The Shadow Zone, which is the region that borders the strike zone.
The average called strike rate in this region is around 47 or 48%, with a range of about +/- 5 or 6%. Jeff Mathis had 55% called strikes on 1726 pitches, where the NOMINAL average is 47%. In other words, if this nominal average is the TRUE average, he's getting 0.08 more strikes per pitch in the Shadow Zone. Which when we multiply by his 1726 pitches gets us 130 more called strikes. On the bottom end is Mitch Garver with 127 fewer called strikes, in the "same" Shadow Zone.
Now, it's not EXACTLY the same Shadow Zone, and we'll get to that in a sec.
Mountains and Pools
My buddies at @SteamerPro and @Fangraphs released their Framing numbers, which is about 4 levels higher than what I've just done. Their version is essentially Mount Everest. Which gives us a chance to compare how close we are with our base version. Is our base version at sea level, or is our base version at Base Camp?
That's a correlation of r=0.88. This means that simply using the called strike rates in The Shadow Zone, without any kind of adjustment whatsoever, we're already at Base Camp.
This becomes an important point here. If we are going to show the catcher framing numbers, with all its (necessary) adjustments, we should AT LEAST show the called strike rates in The Shadow Zone. This is akin to needing to show ERA *and* ERA-. We can't just show Freeland's ERA- of 61, between Nola at 59 and Scherzer at 62. We really need to show their ERA as well (2.85, 2.37, 2.53 respectively). While we undoubtedly need to adjust for parks, and for Coors especially, we also need to show that Freeland's ERA- is LARGELY a product of just plain ole ERA; the park adjustment, while real, is not causing an ERA of 4.56 to be considered equal to Nola and Scherzer.
So, just as a matter of form and suggestion, it would behoove Fangraphs and the other Catcher Framing providers to ALSO show the baseline. Furthermore, it would allow us to talk about this in REAL terms. Mathis caught 55% called strikes on the edges of the strike zone to lead the league. That, by itself, is enough of a talking point. It gets everyone into the wading pool. We can go deeper if we need to, and eventually go to the deep end of the pool for all the adjustments. But, baby steps first. Let's start with splashing our feet.
Shallow Now
Having introduced you to the wading pool, let's now go to the shallow end. We can do a WOWY that will control for the venue, on the idea that the tracking mechanisms of each park is not identical. And so, when we see a 47% called strike rate, it might be slightly different at each park. The park in SF had a 48.9% called strike rate. And when we isolate all the pitcher-catchers who pitched with or against the Giants, and then took those pitcher-catchers at the other 29 ballparks, we see that THOSE battery mates had a 46.6% called strike rate. In other words, we OBSERVE a 2.3% difference. Which we need to regress, since some of that is just Random Variation. And I estimate that true difference to be 1.46%. On the other end is Global Life Park at -1.67%. Coors Field is next at -1.47%, and we'll talk about them with Iannetta soon.
Having established the effect of each park, we can now do our pitcher-catcher WOWY, and adjust out the venues. And when we do that, we end up with the leaders and trailers of Bartolo Colon (hi Julia!), and James Paxton (hi Ellen!).
Paxton had 42% called strike rate in The Shadow Zone, against an expectation of 51%, given his catchers, and adjusting for the venue. That's a 9% shortfall, of which our TRUE estimate is about 6%.
Colon had 57% of his pitches called strike in The Shadow Zone, against an expectation of 43%, or 14% higher, of which our true estimate is about 9%. Note that we haven't even talked about the PARTS of The Shadow Zone, or the pitch trajectory (and Colon being a notoriously fastball-first and essentially fastball-mostly pitcher, could very well have a much higher expectation than the 43% we are seeing).
Anyway, having now determined adjustment factors for venues and pitchers we can now go back and look at each of our catchers Base Camp numbers, or wading pool numbers, and bring them into the shallow end. And when we do that we get this at the top and bottom end (converting each extra called strike at 0.12 runs per pitch):
?
- strike_shadow is the wading pool number
- adj_strike_shadow applies basic adjustment for venue and pitcher (Mathis had favorable context here)
- runs1 converts the extra basic strikes called into runs with a basic 0.12 runs per pitch multiplier
- runs2 uses the adjusted numbers
- Steamer is our Deep End number (thanks to Jared)
As you can see, not much of an adjustment. The correlation does get us a bit closer to Steamer's Deep End numbers (we are now at r=0.89).
Tomorrow
Tomorrow, I'll take the next step, and break up The Shadow Zone into two zones: the part that is inside the strike zone, and the part that is outside the strike zone. Just to whet your appetite, the called strike rate on the inner part of The Shadow Zone is 79% and in the outer part it is 22%. In other words, maybe we'll find pitchers like Bartolo might throw more pitches in the inner part of The Shadow Zone and so that 57% of his we see might be a product of that. Or not. I don't know, since I haven't checked. And I'm going to bed now. See you tomorrow.
For those asking about runs saved outside the Shadow Zone (meaning in Heart of Plate, Chase Zone), this is what happens when I include those.
Runs2_Shadow is the runs saved in The Shadow Zone, after the adjustments noted.
Runs2 includes all 4 regions (though in the Waste, all pitches are called balls, so, in fact it’s only 3 regions).
As you would have expected, hardly makes a dent.
Correlation with steamer go from r=.892, to r=.899
Remember our starting point is r=.88 by simply reporting the percentage of called strikes in The Shadow Zone. We are making very minor gains with every adjustment.