Statcast
StatCast
Tuesday, January 14, 2020
?The theory would be that by being out of position, a fielder will have less familiarity with a situation and so will perform worse than his "natural" location. If this is true, we should see it in Outs Above Average. I did something fairly simple: what is the OAA for a TEAM INFIELD if we have the shift on? And what is the OAA for a TEAM INFIELD if all the fielders are in their standard location?
Note: A shift is any formation where you have 3 or more infielders to one side of the bag.
So, there is an effect and in the direction you'd expect. But not the magnitude. When an infield is playing in its standard formation, they convert 0.0003 more outs per play. With about 2000 plays a season, that works out to 0.6 more outs per season. Hardly a number to worry over, even if you can make the case that it is "true" that they perform "worse".
HOWEVER. However, because shifts are disproportionately set with a LHH, we can break down the OAA of the team infield between LHH and RHH. And when we do that, well, things start to change. With a RHH, the infield does perform better in its standard formation, by a whopping 0.006 outs per play, which is 12 plays per season. Since a shift with RHH essentially means moving the second baseman from the right side to the left side, it is that positioning that we can narrow down as the culprit. This is also consistent with other research I've shown in the past where the performance of RHH on shifts is noticeably worse for the fielding team.
As for LHH, the OAA is slightly BETTER when the infield is in a shift formation, by 0.0026 outs per play, or 5 outs per season. At this point, the "familiarity" issue likely no longer applies, given that one-third of LHH plate appearances are being shifted. This may also explain why LHH on shifts is somewhat better for the fielding team: in addition to getting the fielders in a better spot, they perform slightly better when in those spots.
This is all preliminary, so it'll be interesting to break this down in the coming weeks and months.
Update: I should note that I did not control for the quality of fielders. So, if a team that shifts more happens to do so with better fielders with LHH, then that would explain the results we see. And if a team that shifts more happens to do so with WORSE fielders with RHH, that would ALSO explains the results we see. As I said, this is the first step.
Thursday, January 09, 2020
On 95%+ plays, with an average 97% out rate, Baez made all 147 plays (100%, compared to an expected 142 outs). That's +5
Tatis on similar plays: 111 plays made only 101 outs (91%) compared to an expected 107, or -6.
In the outfield, the vast majority of the OAA is based on 2+ star plays.
In the infield, HALF the value is on 90%+ plays. This is because the chance of a misplay is so much greater on groundballs than airballs.
Making the routine play has tremendous value.
In case you missed it, you can slice/dice right here.
Wednesday, January 08, 2020
?Primer article by Mike on MLB.com
Savant main page by Daren, along with drill down and player pages, with Jason bringing the data together.
My tech blog post: a slimmed down web version, and the expanded downloadable PDF.
Thursday, January 02, 2020
?Earlier today, we created our model, and came up with our implementation. In this third part of today's trilogy, we see the results.
Devers has an average of 93.5 mph on his tracked balls, which generates a launch-neutralized xwOBA of .469, which compared to the league average, over 470 batted balls, is +29 runs. That's league-leading. That's how many runs his exit velocity is creating. Here's the top 5 and bottom 3:
- +29 Devers
- +28 Judge
- +27 Cruz
- +27 Yelich
- +27 Soler
- ...
- -24 Merrifield
- -31 Alberto, Hanser
- -35 Fletcher
Here's where the layering comes in: we can compare the xwOBA (converted to runs) based on speed+angle, and remove the runs based on speed we just calculated, to give us the runs based on angle. In other words, who actually has the best launch angle? Here are the run leaderboard:
- +28 Trout
- +28 Bellinger
- +27 Rendon
- +24 Freeman
- +21 Santana, Domingo
- ...
- -17 Ramos, Wilson
- -17 Margot, Manuel
- -18 Simmons, Andrelton
Simmons has a very poor xwOBA. Now we can show how it breaks down: -7 runs based on his exit velocity and -18 based on his launch angle.
This is where we are going with this, with Layered wOBA. Throughout the year, we'll be creating the various components, so we can see where everyone is getting and not getting their value. Their exit speed, launch angle, spray direction, fielding alignment, park, temperature, and whatever else we can think of.
And all this is in preparation for Statcast WAR 2021. We're in the bottom of the fourth, heading for the fifth inning now.
(4)
Comments
• 2020/01/15
•
Statcast
?This is the Launch Angle distribution, the frequency that a batter will hit the ball at these particular launch angles.
?
We can then apply that frequency distribution to EVERY batted ball hit. So, if you have a 100 mph batted ball, you assume the above frequency distribution. If you have a 60 mph batted ball, you assume the same distribution.
And for every speed+angle combination, we consult our model to determine the xwOBA. And we weight each angle based on the above frequency. So that for every batted ball, we have a launch-angle-neutralized xwOBA value (LANxwOBA ?). Let me know if I need to work through an example.
Once we do that, we can then average this out by hitter. Every hitter has a LANxwOBA value. And this value represents the how much his Launch Speed impact his wOBA.
?
Here's how it looks league wide. For the most part, just knowing the average launch speed will also tell you how much wOBA impact a hitter gets out of his launch speed. But every now and then, it's not the case. Deshields for example is not that far off from the league average (*) in AVERAGE speed. But that's the problem when you look at average: we really care about the DISTRIBUTION of speeds. As we saw earlier today:
(*) I'm excluding bunts. I'm also only looked at tracked batted balls.
The speeds below 95 are basically all the same. Whether it's 90mph or 70mph, there's almost no difference, even if the difference is 20mph. The IMPACT is similar. But between 90mph and 110mph, there's a world of difference.
In other words, the average of two 90mph batted balls is the same as one at 70 and one at 110. But the IMPACT is very very different. Two at 90 is around .250 wOBA. One at 70 and one at 110 is an average of almost .700 wOBA!
This is why we care about the DISTRIBUTION of speeds. And when we apply the wOBA values based on the distribution of speeds, what you get from Deshields is a very low impact wOBA.
(1)
Comments
• 2020/01/02
•
Statcast
?Several weeks back, I introduced the concept of Layered Hit Probability (and by extension Layered wOBA). The objective is straightforward: we see a hit happened, and our job is to simply answer why? and how?. How much was due to the exit velocity? How much to the launch angle? The spray direction? The fielding alignment? The park configuration? The temperature? By layering the various considerations, we can then isolate all those factors.
As a first step, since the one factor that is both the most important as well as linked to the batter himself is the exit velocity, we'll start with that. Now, you may think this is an easy step: check to see how the league does at each exit velocity, use that as the baseline, and compare the batted ball to that baseline. In theory that sounds fine. But, this is what that looks like:
So what does this say? Taken literally, it would suggest not to hit the ball much harder than 112 mph, because after that point, the wOBA flattens or goes down. But, look what happens when we compare the actual wOBA to our model that uses the speed AND angle:
This might give us more insight. The model knows that the harder you hit the ball, the better the outcome... GIVEN a launch angle. So what the above shows is that the really hard hit balls, those close to 120 mph, must have alot of poor launch angles. And this makes sense: the more flush you hit the ball, you gain speed, at the expense to loft. In other words, more line drives and ground balls and fewer fly balls. What you lose in HR, you gain in GB outs. And that's why the wOBA is so low at the very high launch speeds.
What we want to do instead is "neutralize" the launch angle. We take the league-wide spread in launch angles, and apply that to each exit velocity. This way, the 120 mph bucket is made up of EXACTLY the same launch angle distribution as you'd find in the 100 mph bucket... even though we know in practice that does not happen. When we do that, when we neutralize the launch-angle, now we see the effect, the relationship of wOBA to exit velocity. And here we see that starting at around 85mph, wOBA ALWAYS goes up as exit velocity goes up.
?
And this angle-neutralized wOBA is our baseline, what we will compare our actual wOBA against. And if you find that your actual wOBA is below this baseline, which you can see happens at 120 mph, then we can easily explain it by the next layer: your launch angle wasn't good enough.
Stay tuned, as my next post will give us leaderboards for the Exit Velocity component in the Layered wOBA.
(1)
Comments
• 2020/01/02
•
Statcast
Thursday, December 05, 2019
?Every year, we come across the same issue: what model do we use for the upcoming season. When I first started, I created the 2016 model, and applied that to 2017. But I was really off. That's because the 2017 environment was different from 2016. So at the end of the season, we had a 2017-specific model, and regenerated all our x-stats. Then I turned over the reins to Statcast Data Jock @TravisRPetersen. He updated all the models, and for 2018, we used the 2017 model. Well, 2018 was a down year, so once again in the off-season, we created a 2018 model and regenned those x-stats. Can you guess the rest of the story? 2019 was an up year from 2018, we had used the 2018 model, everyone was above average. And here we are. So, we're regenning the 2019 x-stats this evening. It's not really going to be noticeable. There is a bias at the league level of about 0.5% in hit probability, so going from 26.0% down to 25.5% (or .260 to .255 in batting average parlance). But this effects everyone about the same, from 0% to 1% (or .000 to .010) for the most part.
You may be wondering: why not adjust it every day? Well, early season, it'll be too low because it's cold. Now we'd have to adjust the actual BA into an actual BA adjusted for temperature. So our xBA would try to match the temperature-adjusted BA. So, now we won't even be comparing an estimate to an actual. We'd be comparing an estimate to an estimate. Furthermore, updating every day is a monumental task, as we'd have to rerun and reupdate every single play retroactively every day. While it is a good thing to do, the question is if the cost is worth it.
So, our plan is we re-assess at the all-star break, and so, we use that to do any retroactive updates and creation of the new season's model. And if the change warrants it, then we update mid-season. If not, we update in the off-season. Like now.
(2)
Comments
• 2019/12/12
•
Statcast
When the Marlins made their comeback against the Cubs in the 2003 playoffs, I described how WPA worked on a play by play (and in the case of the fan, pitch by pitch) basis. A couple of weeks ago, I described Layered Hit Probability, all the various layers we have to go through in order to explain the how/why that a play happened.
?Sam Miller lays it all out with what we are up against if we try to go to the ultimate, and describe all the baserunning and fielding involved in a play. And he makes the salient point:
To give credit on all of them means building statistical systems that can make assumptions that hold true in as many cases as possible -- and that don't require hours (and that don't rely on personal opinions) for each of them.
What Sam did is identified what we call Action Events. At every Action Event, we stop the play, and understand the landscape. We identify what is the run potential (actually win potential) at that point in time. Then we fast forward to the next Action Event and ask the same question. And we capture that change, and assign that change to the change agent(s) between the two Action Events. And on and on we go, much like I described with the Marlins/Cubs, but far more in-depth, as Sam has done. With the key point that we make sure it all adds up, as Sam showed.
And once we have it all broken down for all plays in an inning or a game or a season, we can tally it all. You can see it in the Cubs/Marlins:
The tally:
Prior + SS = +.076
Prior + Alou = +.051
Remlinger = +.001
Remlinger + Fielders = -.016
Dusty = -.017
Fan = -.031
Prior = -.051
Gonzalez = -.184
Farns + OF = -.271
Prior + OF = -.476
Manager = -.017
Fan = -.031
Pitchers = -.368
Fielders = -.502
TOTAL: -.918 (.018 – .936 = .918)
And the kicker is going to be, that once we have a Statcast WAR, that we may be able to explain the PLAYS, we may be introducing a bunch of random variation into a PLAYER. We'll be taking three steps forward on explaining baseball, but we may be running in place in explaining a baseball player. This is why FIP has such a strong footprint, taking the bird's eye view in explaining a baseball player. You have to be careful in conflating the IDENTITY of the players involved in a play, with the INFLUENCE of the player (as opposed to the effect of random variation). And this gets into the bittersweet symphony of explaining baseball, which I tried to describe in this two-part thread from a while ago.
Sunday, November 24, 2019
?In an excellent article on Catcher Framing, Mike created this image at the team level, which shows the percentage of called strikes in The Shadow Zone.
He further pointed out:
The top team, Arizona, and the bottom team, Chicago, each had a nearly identical amount of takes in that area, 4,819 for the D-backs and 4,803 for the White Sox. Yet the D-backs, led by good framing from Carson Kelly and Alex Avila, had over 400 more called strikes there.
This puts the impact in stark terms. Looking at the called strike rate in The Shadow Zone, one catching team can get 200 more strikes than the average team, while another catching team can get 200 fewer strikes. How much value CAN a strike have? I can tell you the answer is 0.125 runs per called strike, and so, we're talking about +/- 25 runs.
But, let's describe it in something a bit cruder, but with more relevance. If you think of 3 strikes being a strike out, and 9 strikes being an inning, then 200 called strikes would be about 22 perfect innings. Each inning generates an average of 0.5 runs, and so, a clean inning saves you 0.5 runs. If you have 22 of those, then you've saved 11 runs. That's the crude way. The better way is 0.125 runs per called strike.
As for simply relying on the called strike rate in The Shadow Zone, we can compare that to the runs saved on the strike calls per 100 pitches. As you can see, an extremely strong relationship. Indeed, an r of close to 0.95. So, if you are having a tough time buying into Catcher Framing and runs and how all that is derived, you can take the first step and simply look at its most basic: percentage of pitches called strike in The Shadow Zone. If you can do that, you'll be 90% of the way there.
?
(Click to embiggen)
Monday, October 21, 2019
On Sept 15, 2008, at PNC Park, Dodgers catcher Russell Martin caught 19 called pitches in the inside part of the Shadow Zone. That would be zones 11 through 19, within the green dotted line.
While today, those are called strikes almost 80% of the time, it wasn't the case back in 2008. That could be any combination of the umpires improving over time and the tracking system improving over time. So, it would be more accurate to say that he caught those 19 pitches in the reported region noted above. Of those 19, 14 were called strikes.
In that same ballpark on that same day, his teammate A.J. Ellis was also catcher, as was opposing catcher Ryan Doumit. Those catchers caught 18 pitches in the same reported region, but got only 4 pitches called strikes (or 22.2%). Had Martin got the same calls, he would have gotten 19 x 22.2% = 4.2 strikes, instead of his actual 14. In other words, he got 9.8 more called strikes than the other catchers that day in that park.
On April 2nd against the Giants at Dodger Stadium, in the outside part of the Shadow Zone, with Bengie Molina as his opposing catcher, he got 3 strikes out of 20 pitches compared to Molina of 7 for 13. That made Martin MINUS 7.8 strikes that day.
And so we can go through every single game in the same way, and tally up the results. In the Heart of the Plate, he was +63 strikes (+35 at Dodger Stadium, +28 away). However, we would NOT expect any venue bias because of the way we are directly comparing Martin to the other catchers in the same venue on the same day.
- In the inside part of the Shadow Zone, he was +43 at home, +45 away, for a total of +88.
- In the outside part of the Shadow Zone: +29 home, +7 away.
- In the Chase Zone: +39 home, -6 away.
- In the Waste Area: +1 home, 0 away.
All tallied up: +147 home, +74 away, +221 total. Each strike is about 1/8th of a run, and so those +221 strikes translates to +28 runs.
In a more elaborate process that considers more variables and the zone in a more granular fashion, Fangraphs shows +30 runs.
When I repeat this for every year, Martin's career comes out to +171 runs. Fangraphs has a very similar +166 runs.
As much as it strains the credulity to think that Martin's framing could have led to +28 runs, I also can't reject that conclusion. I can reduce that number somewhat for the uncertainty level of the measurement. But given the way I controlled for the metric, by directly comparing Martin to the other catchers in the same park on the same day, that's a tough call as well.
I could repeat the above by focusing on each individual bin and controlling for the pitcher, and potentially the batter. But that basically will put me on a path to replicate Fangraphs. And given that without doing any of that I ALREADY match Fangraphs, all I'd be doing is further matching Fangraphs.
So, I don't want to agree with the numbers, but I am forced to.
I should note that we don't see these wide numbers in the past few years. That could be any combination of the umpires improving and the tracking system improving. It could also be that teams are now very aware not to have a Ryan Doumit behind the plate, so it could be improvement in catcher selection and coaching of catchers. In other words, whatever inefficiencies exist, it's being slowly closed on all sides.
Wednesday, October 02, 2019
This is the point at which Cain got the ball.
?
Runner is about 75 feet from 3B. Taylor Sprint Speed is 29 ft/s, meaning he needs 75/29 = 2.6 seconds
Cain will have to make an almost 200 foot throw. He has a somewhat below average arm at 85 mph. Here's where we need to leave the world of mph and enter the world of feet / sec. 85mph is 125 ft/s. That's at release. The ball will slow down in flight. Roughly speaking, it'll lose 10% every 60 feet.
In this case, we'd do 200/60 = 3.33, and 0.9^3.33 = 70%. So at arrival, the speed of the ball is 70% of 125 ft/s or 88 ft/s. So the average speed of the ball in flight is about 106 ft/s. And so, a 200 foot throw will get there in about 200/106 = 1.9 seconds. (It's not this straightforward, but it's close enough.)
The exchange time (pickup to release) for a throw is about 0.5 to 0.75 seconds, which means that the ball would have reached the VICINITY of 3B in 2.4 to 2.65 seconds. It would have been close if the throw was on target. Which of course, it might not be.
How successful would Cain have been? Probably 60% if the throw is on target. And maybe it's on target 70% of the time? So, about 40% of the time he gets the runner maybe?
In the meantime, it would allow the batter to reach second base as the tying run. But, there were two outs! Making the third out at thirdbase is a cardinal sin for baserunners. Which makes it very appealing for the defense.
Let's work some MORE numbers.
http://tangotiger.net/we.html
Bottom of the 8th, 2 outs, down by 2 runs. Our choices are:
- runners on 1B and 3B (our baseline)
or
- runner on 2B and 3B
- end of inning
So, our baseline is a win expectancy for the Nationals of 15.8%.
- If Cain went for it and missed, then the win expectancy is 19.2%.
- If Cain got the out, then the win expectancy for the Nats is 7.1%.
In other words, the tradeoff is that the Nats gets +3.4% if Cain doesn't hit the target in time, or the Nats are -8.7% if Cain gets Taylor to end the inning.
All Cain has to do is make the play 28% of the time. That is:
- 28% of the time, the Nats lose 8.7%
- 72% of the time, the Nats gain 3.4%
And that's breakeven.
Remember, we guessed that Cain would have gotten Taylor about 40% of the time, and he only needed to get him 30% of the time.
Cain should have gone to third.
?Continuing my look at uncovering park biases, if any, I now turn my attention to Pitch Speed.
The typical way I have done this in the past is the WOWY (with or without you) approach. It's fairly straightforward, if a bit tricky to code. You look at pitchers at each park, and compare themselves to their own speeds in the rest-of-league parks. So, at Fenway and away from Fenway (and not just Redsox pitchers, but ALL pitchers who pitched at Fenway and away from Fenway). You figure out their difference in speeds, weighted by the lesser of their number of pitches in the "two" parks. Here's how that looks for 2018 and 2019. (click to embiggen)
?
Now, simply that we get non-zero values doesn't represent a bias. We have to figure out how much random variation could have contributed to that. We see in the above that Yankee Stadium appears at the top in both years, while Globe Life Park was up one year and down the other. This is a good sign that we've got some level of random variation. A correlation of the two gives us an r of 0.50. This means that about half of what you see (using this method) is signal and the other half is noise. So seeing +0.30 in 2019 for Yankee Stadium would mean there is a bias of 0.15 mph. Every other park in 2019 is less than +/- 0.1 mph.
This is a very weak bias. And it's not even clear that this bias would necessarily be at the tracking level. There could be environmental reasons where the release speed is higher in one park or the other.
As I've linked above, and you have seen in my blog the past few months, I have a clearer method to look for park bias: we compare the home pitchers to the away pitchers in the same park. If for example Citi Field is (literally) home to fireballers, we would not expect the tracking of the away pitchers to also have a high pitch speed. But, if the Mets pitchers aren't that (pun intended) hot, but the tracking is showing them high, we'd expect the away pitchers to also have their speeds read hot.
So, a flat line shows zero bias, and a sloped line at 45 degrees shows complete bias. Here's how it looks in 2018 and 2019, limited to fastballs and sinkers only (click to embiggen):
?
?
To say I was sabermetrically ecstatic when I ran this a few minutes ago is to put it mildly. Citi Field tracks the home pitchers hot, and the away pitchers not. Which is what you'd expect on a team of fireballers. Yankee Stadium does show the away pitchers slightly hot, consistent with the WOWY approach I just presented.
However, we can't just look at individual points. The key is to look at all 60 points. And all 60 points are scattered all over the place, with no correlation at all between the fastball speeds of home pitchers to their peers in the same park.
Also note that range in speeds of home pitchers is quite wide, at +/- 1.5 mph, while the away pitchers (made up of basically every other pitcher in the league) at +/- 0.5 mph (or -0.6 to +0.4).
As I do a year-end analysis of all the data points you've seen me post about in the past few years, I will run these home/away park bias reports, so we can see the extent to which biases exist (if any). And how we need to correct it (as we saw with the Catcher Framing)
Monday, September 23, 2019
In other words, do we need to worry about DIPS (or FIP)? No.
Justin Verlander has allowed 487 batted balls, and gotten 349 outs. If we focus only on the quality of contact (launch angle+speed), we'd have expected he gets 345 outs. So, he got only 4 more outs than expected outside of his influence. If you wanted to stop reading here, you'd be fine: Verlander's results are consistent with his individual contributions.
***
What did we not control for? The fielding talent of his fielders, the team alignment of his fielders, and the spray direction of his batted balls. Naturally, all 3 of them are interlinked.
We can focus on the fielding talent of his fielders first. When he was on the mound, his fielders were a little bit above average. How much above average? +4 outs. That is, based on how much distance they had to cover, and how much time they had to get there, the Astros fielders got 4 more outs than average.
In other words, we can explain how he got his 4 extra outs. And therefore, we give Verlander credit for getting 345 outs on 487 batted balls. The league average is 65.5%, and so on 487 batted balls, a league average pitcher would have gotten 319 outs. Since he actually got 345 (after accounting for the talent of his fielders), Verlander is +26 outs.
Now, what about the fielding alignment and spray direction? So, this is an interesting question. The Astros rarely shift on RHH with Verlander pitching, while they always shift on LHH with Verlander pitching. Since Verlander is obviously well aware of the fielding alignment behind him, he is pitching to that alignment. If he can get the hitters to hit to where the fielders are, I'd contend this tells us more about Verlander than the fielders.
Now watch this. With RHH, Verlander got 181 outs, while getting almost 4 outs of support from his fielder's fielding talent. So, that's 177 outs otherwise. And based on quality of contact (speed+angle only), we expected 177 outs.
With LHH, Verlander got 168 outs, with no extra fielding support. Based on quality of contact, we expected 168 outs.
In other words, whether massively shifting all the time, or never shifting, the number of outs that Verlander got is entirely determined by the quality of contact. That is, we can safely ignore the fielding alignment, if we can also ignore the spray direction.
Verlander has a .246 wOBA and a .247 xwOBA. That he happens to have an historically low .218 BABIP is inconsequential.
Run values of HR
How do we know that a HR will add an average of around 1.4 runs? You can look at it from a
pretty high level view and simply look at how many runs a team scores when they hit 0 HR and when they hit 1 HR. When I did this some 15 years ago, the answer was 3.08 runs scored in games with 0 HR and 4.62 runs in games with 1 HR. Taking the huge leap of "all other things equal" (
ceteris paribus), that difference of 1.54 runs we would attribute entirely to that 1 HR. In games with 2 HR, there were 6.12 runs score, or 1.50 more runs than the 4.62 runs that scored with 1 HR. So, we attribute that 1.50 runs entirely to that 1 HR.
Now, that huge leap of "all other things equal" can be verified. And indeed, in games where there are no HR, those games also feature a bit less of other hits and walks. In other words, things are not equal. Once we account for that, the end result is closer to 1.4 runs being added by the HR.
RE24
We can get there in other ways. We can look at the 24 base-out states. Bases Empty, 0 outs? That's a base-out state. Runner on 3B, 1 out? That's a base-out state. Runners on the corners, 2 outs? That's a base-out state. There are 8 different combination of base states, and obviously 3 states for the out. And so we have 8 x 3 base-out states.
Each base-out state has its own run potential. Bases empty 0 outs is about 0.5 runs. That's because in a 4.5 runs per 9 inning environment, you would score 0.5 runs per inning. Meaning that your initial state, the bases empty, 0 outs state, is therefore worth 0.5 runs. What we care about at the START of a state is the POTENTIAL to score, the expectancy. And each of the 24 base-out states has its own run potential, what we call the Run Expectancy chart.
And as you transition from one base-out state to another, that difference we attribute to the event that caused that change. In other words, the event is a causative agent, and we track the change in run expectancy for each event.
A HR with the bases empty 0 outs for example will change the run expectancy from 0.5 runs to 0.5 runs (or a delta of 0 runs), but we added 1 run to the bank. So, the change in run potential + actual runs is exactly 1.0. With bases loaded 2 outs, the starting run expectancy is around 0.8 runs, and the ending run expectancy (bases empty 2 outs, since the HR cleared the bases) is 0.1 runs (plus the 4 runs in the bank). So the change in run expectancy is 4 plus 0.1 minus 0.8 = 3.3 runs. We therefore give 3.3 runs to the HR.
You go through all 24 base-out states, for every HR that was hit, and you will find that the average change in run expectancy is 1.4 runs. So, the RE24 for a HR is 1.4 runs. We repeat this for all events, and we get the run value of each event, from almost -0.3 runs for an out to +1.4 runs for a HR.
However, we are going to maintain the run values of each event for each base-out state. We will give credit to a bases loaded 2-out HR far more than a bases empty HR.
RE12
Now, there are also 12 ball-strike (plate count) states. Going from a 0-0 count to a 0-1 count for example decreases the run expectancy by about 0.05 runs. And so we attribute that change in run potential to the strike. A first pitch strike is therefore worth close to 0.05 runs. If this was a 3-2 count, and you get a strike (and hence a strikeout), you would go from around +0.06 runs to -0.27 runs, or a change of -0.33 runs. This I call the RE12 run values.
RE288
We can now MERGE the 24 base-out states with the 12 plate count states (RE24x12 or RE288) to get a run potential for every pitch at every base-out state. A bases empty 0-2 count has a run potential of 0.42 runs. A HR will bring this back to a bases empty 0-0 count (worth 0.51 runs in this chart), along with the HR (1 run), for a difference of 1.09 runs. So an 0-2 HR is worth +1.09 runs.
We can therefore give a run value to every single pitch that we have, over 700,000 such run values. And once we do that, we can aggregate those run values along any dimension we like. Like for example, where in the strike zone the pitch crosses. We created 4 Attack Regions.
Attack Regions
Each of those regions represents something real in baseball.
- The Heart of the Plate is what the batter is waiting for and the pitcher is avoiding. Pitchers throw about 25% of their pitches here. Batters will generally swing at these pitches almost 75% of the time. And if they don't swing, it's a called strike.
- The Shadow Zone is the area that straddles the strike zone on both sides: called pitches here are basically 50/50 ball/strikes. Swings generally results in below average results (notably swing and misses). Pitchers are really targetting this region, with over 40% of their pitches coming here, and batters swinging just over 50% of the time.
- The Chase Region is where pitchers are trying to get batters to chase. Almost 25% of their pitches are here, and batters swing almost 25% of those pitches, almost always with poor results. But when they take, it's a called ball.
- The Waste Area are pitches well off the plate, into the batters boxes. Less than 10% of pitches are thrown here, with a bit over 5% of batters still swinging. Virtually no pitcher is actively targetting the Waste Area. And no batter at all ever wants to swing at such a pitch.
We can therefore show the run values for each of these 4 regions, like so.
Swing/Take and more
Furthermore, we can also aggregate the run values along Swing/Take. Alvarez adds 14 runs by swinging 593 times and adds 21 runs by taking 768 times.
And finally, we can aggregate based on BOTH the Attack Region AND the Swing/Take. Alvarez adds 7 runs by swinging 296 times at pitches in The Shadow Zone.
And you can check it out for any hitter (and pitcher!) right here on Savant.
And for switch-hitters, we also allow you to toggle between LHH and RHH if you so choose, by clicking on the batter image.
What's next? Aggregating by pitch types (fastball, changeup, curve, etc). Or aggregating by plate count. Or aggregating by base-out state. Or by any/all combinations discussed in this blog post. Of course, at some point, you will slice/dice the combinations to such an extent that nothing useful will materialize, so you'll have to be careful. But we're working towards giving you that capability.
In the meantime, you can check out analysis from around the web, like this terrific piece that came up (somehow) minutes after the site became live.
Monday, September 02, 2019
?About six months ago, in introducing a simple way to create the Catcher Framing metric, I also showed how to quickly test for park bias in that metric. It actually can apply to any metric. In any sport.
Let's apply this concept to the exit speed of a batted ball. The key to the concept is that we presume no relationship in talent between the home batters (and opposing pitchers) compared to the away batters (and home pitchers). What we do is for each park we figure the average exit speed for the home batters (or the bottom of the inning) and the away batters (or the top of the inning). In Fenway 2019 for example, the exit speed on the bottom of the inning was 90.7 mph (or +1.9 mph above league average) and in the top of the inning it was -0.1 mph from league average. We repeat this for all 30 parks, for the five years of Statcast.
If there is no correlation at all, and there shouldn't be based on our assumption of fact, we'll get an r close to 0. If we do get a larger correlation, that would point to some sort of park bias. That bias could be the tracking system. It could also be the players responding to the peculiarities of the park. And what do we get? r=0.06. In effect, an r close to 0, and therefore showing no park bias.
Aspiring saberists can use this technique, in any of the sports, to look for biases in metrics, whether measured like I am doing here, or calculated, as I did with the Catcher Framing.
?
Saturday, August 24, 2019
?No.
This is what I did. I looked at all batters in 2018-19 who had at least 100 batted balls in each season. For each hitter, I tracked the frequency of their launch angle in the sweet spot (8 to 32 degrees, where all the solid hits and HR come), less than 8 degrees (basically GB and very low line drives) and more than 32 degrees (basically high FB or popups).
I classified each hitter based on their change from 2018 and 2019 in the frequency of the above, and dumped them into 5 groups. The group that lowered their launch angle the most had a drop of an average of -3.5 degrees. Those that raised the launch angle the most increased by an average of +4.4 degrees. The other three groups were: -0.8, +0.1, +2.0.
Now, why would we think there might be a change? Well, it's the mishits. The top-end exit velocity, we wouldn't expect much to any change. But the more you deviate your swing plane from the oncoming pitch plane, the less flush you might hit the ball, and so, prone to mishits, which means it'll reduce your exit velocity. Of course, you are also prone to a complete swing and miss, which will turn a mishit (and its reduced exit velocity) to no-hit, and so removed from the sample!
Anyway, here's the change in exit velocity for our 5 groups:
- +0.3 mph: Lowered Angle (Major)
- -0.1 mph: Lowered Angle (Minor)
- +0.4 mph: Neutral change in Angle
- +0.2 mph: Uppered Angle (Minor)
- +0.5 mph: Uppered Angle (Major)
As you can see, no real trend here. We do see an overall increase in speed in 2018 to 2019 of about 0.3 mph, which is consistent with a 0.4 mph that I reported earlier.
My next step will be to look if the exit velocity distribution changes by launch angle. You would THINK that if you lowered your launch angle, then your higher exit velocity will now happen at the lower launch angles. And similarly, if you increase your launch angle, the higher exit velocity will now happen at the higher launch angles. But, there's reasons to think that it shouldn't matter. I haven't looked at the data, so as soon as I do, I'll post my findings.
Tuesday, August 20, 2019
?A few days ago, some people were talking about approaches on 3-2 counts. I expanded that to look at any 3-ball count. And I focused on pitches that were in the Chase Region or Waste Area. Basically, these are pitches that are automatic walks (you are in a 3-X count, and the pitcher is throwing it well off the plate). But not all batters will take all the time. A batter is essentially "refusing" to take the walk at this point. Here are the "walkaway" rates for hitters who are in a 3-ball count the most often.
There are three things that can happen:
- Smartly accepts ball 4
- Luckily rejects ball 4, by somehow getting a basehit
- Poorly rejecting ball 4, by whiffing or making an out
That last column, the walkaway rate, is simply the count of poorly-rejected divided by the number of opportunities.
?
(Click to embiggen)
Wednesday, August 07, 2019
?I asked Statcast Intern Kristen to review my original classification of Barrels and the other 5 batted ball outcomes.
Here's her report (pdf). Other than to make one tiny almost inconsequential factual correction, the entirety of the report is hers, and went through no edits on my part. Based on her findings, in the offseason, we'll need to tweak our definitions.
Friday, July 26, 2019
If a ball is hit 390 feet to dead center, how do we want to handle the CF positioned 300 feet (0% catch prob) from home compared to being positioned 340 feet (100% catch prob) from home. And how do we want to handle the batter? Is it a ball hit 390 feet and so is a HR almost 50% of the time? Or is it a ball that lands 15 feet short of the fence, and so is a HR 0% of the time? And what about the pitcher in all this? Do we care about the ball, the batter, the pitcher, or the fielder? Or all of them? Are they intertwined or independent?
Trying to get “one” answer to multiple legitimate questions is how we get into trouble. If you make the metric so specific that it can ONLY answer one question, you’ve boxed yourself in.
This is why I like FIP: it does what it does, no more, no less. And it allows other things to be built on top of it. So in my view, I like speed+angle, because those are the two things the batter has the most influence on. And it’s “scaled” to hits or wOBA. “Hit Probability” is too specific a term for what the metric is doing. We may call it “hit probability”, but it’s more “Speed and Angle impact to hit probability”. That's the metric. If we wanted to include EVERYTHING, then guess what: the hit probability is what you see, it was either caught or not. You have to decide what you want to peel away, and more importantly WHY.
If you follow the FIP approach, what you care about is the influence of the player has on a typical play, not THAT ball, and certainly not THAT play. There's no right or wrong answer. You just need to define your question very specifically, and live with the consequences of its implication. FIP takes a minimalist approach, doesn't try to do too much, and so, is flexible. That's why speed+angle is what we use for batted balls.
Wednesday, June 19, 2019
UPDATE: (2019/06/20 10:10)
Following a comment by Saber Watchdog Hareeb, who pointed out a potential bias by temperature (since I did not control for park), I looked into it, and he was totally right. The bias was noticeable. As a result, I updated the methodology below to control for temperature and elevation, and excluded Coors altogether. The text and charts have been updated to reflect the new data. The overall conclusion in terms of direction mostly remains, but the magnitude is reduced in terms of the effect of movement, but not in terms of the effect of runs (overall, but it did have an effect on each pitch type). Thanks Hareeb!
Methodology
Following up on the research method I developed when I looked at speeds of pitches, I am now turning my attention to movement.
Let me give you a brief introduction to that methodology, which has changed ever so slightly:
- Start with all pitches thrown in 2018
- Remove all untracked pitches and all knucklers
- Collapse all pitches into one of these six categories, our Arsenal of Pitches: Risers, Sinkers, Cutters, Changeups, Sliders, Curves
- Tag every pitch has whether it was low-elevation (under 300 feet) or high-elevation (over 300 feet). Within each, create three even buckets of Low-, Medium- and High-Temperature.
- Remove any pitcher that threw fewer than 500 pitches (~30IP)
- Remove any pitches where a pitcher threw less than 10 pitches of one of the above types
- Remove all pitches at Coors
What we are left with is therefore: all 464 pitchers in 2018, who threw at least 500 pitches, of which at least 10 were classified as one of riser, sinker, cutter, changeup, slider, curve
Separation
Now we start our separation. We use total inches of movement as our focus. By total inches of movement, I mean the difference in location of the pitch in 2D space at plate crossing, compared to where it would have crossed, if we removed the effect of spin.
For each Arsenal of Pitches (so for example, focusing on Verlander's 816 Sliders) at each Park Grouping (so for example, high-elevation, high-temperature we have 42 Verlander sliders), we break them up into 5 groups based on how much the pitch moved. For the 10% (or 4) pitches that had the most movement, we put them in Group 1. For the 10% pitches that moved the least, we put them in Group 5. For the 40% (17 pitches) that had average movement (for Verlander), we put them in Group 3. We round out Group 2 (20% of pitches, above average movement), and Group 4 (20% of pitches, below average movement).
In other words, we have progressively less movement, as we go from Group 1 to Group 5. And the percentage of pitches in each group follows a 10/20/40/20/10 shape.
This is what it looks like so far. We have 420 of 464 pitchers who threw Risers (4-seam fastballs). Those pitchers averaged 1390 pitches, of which 527 were Risers. They are broken up into 5 groups as shown (the sum of which is 527 pitches).
Movement
Now, how much movement is there from Group 1 to Group 5? Roughly speaking, there's about 1.7 inches of movement between each Group. For Risers and Sinkers, group to group, the difference is about 1.5 inches. While for Curves, the difference is about 2.0 inches group to group. Cutters, Sliders, Changeups are 1.7 to 1.9 inches. Overall, the average is 1.7 inches of difference group to group, or 6.6 inches from high to low groups.
Here's that full chart.
So far, we've selected a large group of pitchers, with a separation by pitch type, and further segmented into amount of movement. The key here, since I did not hit you over the head with it, is that we have proportionate representation of our pitchers. We don't have Verlander's Riser all part of Group 1 and Group 2, even though he throws very hard. No. Only 10% of his Risers are in Group 1. Just like 10% of Jason Vargas "fastballs" are in Group 1. In this way, we are not biasing our data. Proportionate representation. This is the key to this study.
You will notice that we did not control by speed of pitch. Is it possible that the amount of movement is tied in to the speed of the pitch? Sure, it's possible. But we can check. And while theoretically it is possible. In practice, once we group the pitches, there is no bias in pitch speed. This is the other key to the study, but more happenstance, that we are reducing the chance of bias for ancillary reasons. Here's the results of that. Maybe an issue with Cutters, as well as with Curves.
The speed-bias on curves is interesting: the faster it is thrown, the less time in the air, and so, less time for the ball to move. Of course, it's also less time for the batter to react. We're not talking about much of a bias anyway, but we'll deal with that in a future iteration.
Impact by Movement
Anyway, now that we're all happy with where are, now we want to find the IMPACT of movement. To do that, we need a metric. And that metric will be Run Values. A HR has a certain amount of run value (+1.4 runs), a strikeout has a certain amount of run value (around -0.27 runs). Every event has a certain run value. But, we also have non-outcome events, notably balls and strikes. Throwing a strike reduces the potential for future runs, while throwing a ball increases that potential. Generally speaking, it's about +.06 for a ball and -.06 runs for a strike. Basically.
All we have to do now is for every pitch, simply add up the run values based on the event on that pitch (HR, walk, strike, etc). I'll first give you the overall results, then we'll dig deeper for every Arsenal of Pitches. The results are presented in terms of run value per 100 pitches.
Note that a negative number means runs are reduced, the way a lower ERA is better for the pitcher. In other words, negative is good for the pitcher.
The difference between a pitch with the most movement and a pitch with the least movement (ceteris paribus, or all other things equal) is almost 0.80 runs per 100 pitches. You'll remember that the difference in movement from pitches with the most movement to the least movement was 6.6 inches. In other words, each one inch of movement leads to 0.12 runs per 100 pitches of value. And since there's about 150 pitches per 9 IP, that essentially means each one inch of movement will affect your ERA by 0.18.
Arsenal Impact
Now, let's drill down at the Arsenal of Pitches level. Let's start with Risers (4-seam fastballs). Here we see that the change in run value is fairly dramatic. We are comparing 93mph fastballs that move 21 inches to 93mph fastballs that move 15 inches, thrown by the same pitchers in each group. And the run value is -0.46 for fastballs with the most movement to +0.80 runs for fastballs with the least is a dramatic 1.26 runs per 100 pitches of difference. So, fastballs need to move alot. And the more they move, the far more impact they have.
Sinkers follow a very similar pattern. The effect of Changeups is a bit more muted, with about half the effect, but it follows a similar pattern of Risers and Sinkers.
The effect of Sliders is much more muted. Generally speaking, sliders that move more have more impact. But it's more in terms of a threshold. As you can see in Groups 1, 2, 3, they all have similar run values. A slider needs to move, but it doesn't need too much movement. If we compare Group 3 (-0.59 runs) to Group 5 (-0.27 runs), that's a difference of 0.32 runs (per 100 pitches), compared to a difference of movement of 3.8 inches. That's in the ballpark of 0.08 runs per inch. So, we definitely don't want a non-moving slider.
Curves follow a very similar pattern to Sliders. You want a curve to move, but you don't need too much movement.
My guess: a curve that moves "too much" is probably a pitch that's going to be way outside the strike zone. In other words, Curve balls are probably going to be thrown to the edge of the strike zone to begin with. Throw them with too much movement, then suddenly, they are way off the plate, and the batter won't be fooled.
A curve that doesn't move enough, essentially a "hanging curve", does have a very notable effect. The difference in run value between Group 3 (-0.19 runs) and Group 5 (+0.34 runs) is 0.53 runs for 4.0 inches of movement. So, this is fairly notable, 0.13 runs per inch.
Cutters are a different story. Their impact does not seem to be tied to their movement. Or if it is, it is not a clean distribution. I would say that Cutters are exception cases, and need special handling.
Conclusion
Overall, I'm quite pleased with the results. The general direction is maintained, that the more movement, the more effective. We learned that some pitches (pitches that tail, meaning Risers, Sinkers, Changeups) are tied heavily to their movement, and pitches that hook (Curves, Sliders) are tied to a minimum level of hook needed. And we learned we need more learning when it comes to Cutters.
Another thing to note is the naming of pitches. This has been an issue for me, that pitchers call their pitches whatever they want to call it, and our policy is to call the pitches what the pitchers call it. So two pitchers can throw the same pitch, at the same speed with the same movement, and one will call it a slider and the other a cutter. Or one will call it a slider, and the other a curve. At some point, I'm going to create my own naming system (without using the words cutter, slider, curve) so as to not conflict with current convention.
(4)
Comments
• 2019/07/03
•
Statcast
Recent comments
Older comments
Page 2 of 150 pages < 1 2 3 4 > Last ›Complete Archive – By Category
Complete Archive – By Date
FORUM TOPICS
Jul 12 15:22 MarcelsApr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref
Apr 12 09:43 What if baseball was like survivor? You are eliminated ...
Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method)
Jul 13 10:20 How to watch great past games without spoilers