Just the first step in looking at this. The left column is the height of the left knee, the top row is the right knee.
The most common position for the catcher is for the left knee (the glove knee) to be 3-5 inches off the ground, while the right knee is 17 to 20 inches off the ground.
We do see the catcher often enough with their right knee 3-5 inches off the ground, with the left knee 11-20 inches off the ground. I should probably split this by bat-side (and maybe pitch-hand).
Having both knees up, 14-19 inches off the ground is the least popular of the setups.
I'll be looking as well to see how the called strike rate is affected based on the catcher stance.
Hitting a 450 foot HR is very indicative of a batter's talent. It shows that he has raw power and it shows that he can really put the barrel on the ball.
Hitting a 110 mph high popup to an outfielder for an easy out is also a good indication of a batter's talent. It shows that he has raw power and that a small mistiming is what kept him from hitting a 450 foot HR. This is what is called a Major League Out. For that particular PLAY, an out is an out, and is always bad. For that particular PLAYER, a Major League Out is almost a HR.
Similarly, a Texas Leaguer that clears the infield and lands in front of the outfield is always good for that PLAY, it is not a good indication of talent for that PLAYER.
So, let's talk about Expected wOBA and Predictive wOBA. Expected wOBA is the expected value of that PLAY. A Texas Leaguer is going to have a near 100% hit probability, and so if you have a low launch speed and a high launch angle, you are going to get an Expected Hit Probability that will approach 100%. It will explain that PLAY in RETROSPECT. It is much (much much) better to think of Expected Value in Retrospect, as in: The Expected Hit Probability WAS... Obviously, the word Expected can be used in both backward (expected was) and forward (expected will be), and so is confusing and ambiguous. The x-stats that you see, whether at Savant or anywhere else on the Web, are almost always meant to be retrospective, and simply measuring the PLAY.
Predictive wOBA is different. Predictive wOBA is tied to the PLAYER and not the PLAY. This is a critical distinction to make. A Major League Out is a much better outcome in describing the talent of a player than a Texas Leaguer Hit. When you see a Major League Out, as a fan, you should be disappointed, but as a scout, you should be elated. And the reverse when you see a Texas Leaguer Hit. The Expected wOBA (describing the PLAY) and Predictive wOBA (describing the PLAYER) is what you need to constantly remind you.
Let me show you three charts (click to embiggen). The first maps the current season's actual wOBA (actually it's wOBAcon, since we are only looking at batted balls, or Contacted plate appearances) to next season's wOBA (wOBAcon actually), for the 2020-2023 seasons, minimum 100 batted balls. Why do we compare to next season's wOBA? Because that is a (mostly) unbiased estimate of a batter's talent. That is a correlation of almost r=.5, which is pretty good. You can see that the slope of the estimate is close to .5, which means that next season's wOBA can be estimated as half-way between the current season's wOBA and the league average. So, a HR (wOBA of 2.000) in 2022 will indicate a wOBA of 1.180 of talent in 2023 (half-way between 2.000 and league average of around .360).
As we know, Actual Outcomes are filled with vagaries of the fielders and the park and the ball and on and on. This is why we prefer Expected wOBA over Actual wOBA. Expected wOBA is focused on those launch characteristics most in control of the batter (launch angle and speed), without worrying about whether the ball carries for 300 feet or 320 feet, or pulled at 20 degrees or 30 degrees, or how good the fielding alignment is positioned or how well the fielder reads the ball. All of those variables is what turns an Expected wOBA into an Actual wOBA. And Expected wOBA describes a batter's talent better than Actual wOBA, which you can see by the correlation having an r above .55. The slope of the line also suggests that you would weight the Expected value at 58%, and the league average at 42%. Every combination of Launch Angle (from minus 90 to plus 90) and Speed (from 0 to 125) has a distinct Expected wOBA value. The chart is obviously massive at 181 x 126 entries.
Now we come to the star of the show, Predictive wOBA. How does it work? We break up a batter's launch characteristics of each play, first along Launch Angle, into three categories. We have the Ideal Launch Angle, the Sweetspot range, of 8 to 32 degrees. We have launch angles above 32 and launch angles below 8 degrees.
Then we break up a batter's Launch Speed into four categories: 105+, 100 to 104.999, 95 to 99.999, and under 95 mph.
This gives us 12 combinations of speed and angle. For each combination, we get a Predictive Value (analogous to Expected Value, but in terms of the PLAYER, and not the PLAY). Here are those values.
So, when a batter gets his Major League Out, or any batted ball at over 32 degrees of launch at 100+ mph, that has a Predictive wOBA value of .838. This is one of the BEST things a batter can do, as it indicates TALENT. That's what Predictions are: an estimate of the TRUE TALENT of a PLAYER. That perfect hit, a ball hit at the Ideal Launch Angle of 8 to 32 degrees, at 105+ MPH: that is only SLIGHTLY more indicative of talent than a Major League Out: that has a Predictive wOBA value of .867.
The worst thing a batter can do is a high launch angle and low-speed, as that has a true talent value, a Predictive wOBA value of .206.
Once we apply the Predictive wOBA on each batted ball for every player and aggregate it at the season level, we can then compare to the next-season's Actual wOBA. And this is what we get: a correlation of r=0.61. Indeed, if you include all three measures, the Actual wOBA, the Expected wOBA and the Predictive wOBA, we STILL get an r=0.61 to next season's wOBA. This suggests that creating these 12 bins (it's actually 8, as there are some bins that stretch beyond one bin) is sufficient to describe a batter's profile. And we can completely ignore a batter's Actual wOBA as well as their Expected wOBA.
Can you improve this? The next step really to make it truly predictive is to also incorporate the amount of batted balls in the sample. The higher the number, the more indicative the outcomes are. But, we'll save that for a future thread.
So, we no longer need to compare Expected wOBA to Actual wOBA and talk about luck or Random Variation as being the distinguishing feature as to why the Expected wOBA diverged from Actual wOBA. No. What we actually care about is Predictive wOBA, and we don't even care about Actual wOBA any more, not if we care about the True Talent of our batter.
Next time, I'll repeat this process for Pitchers. I haven't done it yet, so I'm just as curious as you are.
As we are continuing our look into the Bat Swing data we've been receiving, a centrepiece article that you should read or otherwise keep in the back of your mind is from Alan Nathan from 2015, who naturally has been doing this for even longer and that article is a terrific summary of everything he had learned, put in a way that a baseball fan can appreciate. We're just standing on the shoulders of giants here.
One of the key pieces is the Attack Angle, which put simply is the trajectory of the bat at the collision point. To put it a bit less simply, it is the tangent line to the path of the bat. To put it even less simply, it is based on the velocity vector, normalized ("unitized"). While we are interested in the angle at the collision point, we can actually calculate this angle along any part of the trajectory of the bat. All we need to do is calculate its velocity vector.
The vector can be helpfully be broken down into a vertical component and horizontal component. The vertical component is a contributing factor to the resulting launch angle. The horizontal component is a contributing factor to the resulting spray direction.
It's always helpful to use a perfect launch swing, and those are easy enough to find: look for long homeruns, like this video. And the Attack Angle for the swing, from the start to the end, can be seen in this chart (click to embiggen), where 0 is at or very close to the impact point.
So what are we seeing exactly here? It's probably easiest to start with the bottom chart, even though that will be the less interesting one. The swing goes full circle: it starts at one horizontal angle, and eventually comes full circle back to the same angle. If you can remember that plus 180 degrees and minus 180 degrees is the same point, then you can see how the swing indeed comes full circle. At the impact point, it's at minus 10 degrees. In other words, the bat path is 10 degrees to the pull side, which is pretty normal for a HR that was pulled. Had the contact happened 4 milliseconds earlier, the bat path was at close to 0 degrees (straightaway). When we talk about a game of inches and milliseconds, we mean it quite literally.
We do see a dip right after the collision point. That, I believe, is an artifact of the way the data is generated. In order to create the bat path, the swing is broken down into a pre-impact and post-impact trajectory, and then "connected". When you connect two disjointed lines like that, things like this will happen. Eventually we should create a process to smooth this out, but we're still in the development phase here, and we're sharing as we're learning.
Now, the more interesting part is the top part, the vertical component of the Attack Angle. This is because the launch angle is far more important for the batter than the spray direction. And we see it follows a golf type path where the vertical angle dips heavy into the negative at minus 50 degrees, at a point 50 msec from impact. In other words, had the collision point happened at that point, that bat would be hitting the ball at a heavy downward angle. In the actual case, he hit it perfectly, at an Attack Angle of +17 degrees (in the vertical portion). If you check out Alan's article, you will see "Table 2: Optimum swing parameters for maximum distance", with a value of around +18 degrees. Score another one for creating a model to match eventual reality. And like above, being early by 4 milliseconds would have made a big difference: + 9 degrees instead. (And that blip exists here as well. Again, just something we have to work our way through.)
While a swing speed is critical, and being able to make contact at the sweetspot is important, the most important part is in fact making contact in the first place. It's obvious to say, but MLB players are so incredibly gifted, this assumption of fact at this level is nowhere close to a given at the lower levels.
We have been using "Barrels" in the Statcast era based on an inference model: if your launch angle and speed reached a certain range, it leads to high production, and so, we would infer that your swing speed and contact point in the sweet spot was optimal. For the purposes of the data we had, that worked out great.
Well, now we are starting to have access to more data. And instead of inferring if a ball has been barreled, we can more directly measure it.
In the chart below (click to embiggen), we show the wOBA at various combination of swing speeds and impact points on the bat. The Green Box has an average wOBA of .624, and that refers to a sweetspot at 5-8 inches from the head of the bat, with a swing speed of 80+ mph. I don't know what to call this other than "barreled" (based on swing data), but that would come into conflict with our other definition of Barrel (based on launch data).
In the orange box, we see the impact point on the part of the bat just outside the sweetspot (4 to 4.99 inches and 8 to 8.99 inches). Just as interesting is the next line, 70 to 79.99 mph, where the actual sweetspot (5-8 inches) gives results not that much different from one inch outside the sweetspot.
In order to have major league success, you need to swing at least 70mph and be able to make contact 4 to 9 inches from the head of the bat. If you can't make contact there at that speed, you won't be able to survive in the majors.
The data we currently have is very limited, so things will change. That said, directionally, all the results conform to expectations. And what we are now learning is the magnitude of the effects. So all this is preliminary, but very promising.
Below is the wOBA by Swing Speed and Impact Point of the Ball (relative to the Head of the Bat). The Sweetspot of the bat is close to 6.7 inches (17 cm). We'll figure that out as we get more data.
(click to embiggen)
In order to find success at the major league level, at a minimum, you need to be able to swing the bat consistently at 70+ mph. Anything less and, regardless as to how perfectly you can time your swing, you won't be successful at the major league level. By the time a player has reached the major league level, you can pretty much assume that they've been selected for this skill (even if they don't necessarily measure this particular attribute).
More importantly, they need to be able to hit the Sweetspot, or close to it. Even if you miss the sweetspot toward the head of the bat by 2 inches (say, making contact at 5 inches instead of 7 inches), the extra (linear) speed of the bat at that impact point will somewhat make up for the difference of missing the perfect sweetspot.
It is less forgiving the other way: missing the sweetspot by 1 or 2 inches (toward the knob of the bat) also means that the (linear) speed of the bat at the impact point is less. So, you end up losing on both counts.
That said, a contact point 9 inches from the head of the bat is better than 4 inches from the head of the bat: startng at the sweet spot, you lose more energy from the bat as you go toward the head than toward the knob. And the extra (linear) speed of the bat can only help so much.
This is somewhat useless without also knowing the swing speed (a slow bat that hits the Sweetspot is not going to do damage). To the extent that we want to drive the importance of the Sweetspot, this allows us to create metrics as to how often the batter is squaring up the ball in the Sweetspot. It looks like the success point is around the 4-9 inch mark, but we'll know more as more data comes in.
I'll marry the sweetspot to the swing speed in my next blog post.
For the last five years, I've been thinking, working on, and otherwise dreaming about "point of no return" in terms of a player's swing: What is the very last point at which a player can successfully check his swing.
And with the data we are collecting, we are going to be very close to determining that.
This is the frame-by-frame speed of fours swing: 3 checked swings and the hardest hit ball we have. (click to embiggen)
And somewhere around 100 msec before the impact time, a checked swing and a perfect swing are moving at similar speeds. The batter is committed at 100 msec, and is rapidly accelerating his swing.
At 50 msec before impact, the committed swing is already at 40 mph, which is about what a checked swing will be at the impact point.
In terms of what we can measure, there's a couple of choices here.
We can see how many MPH a batter can add to his swing in the last 50 msec before impact. In this illustration, that's about 58 mph.
We can see how many msec it takes a batter to add 50 mph to his swing. In this illustration, that's 42 msec.
We can try both, and ask starting at 40 mph, how many MPH and msec does it take for the batter to reach his speed at impact. In this illustration, that's 60 mph in 55 msec.
As we see more and more data, the presentation for fan consumption will start to present itself. We're looking forward to that moment.
Update: We can also calculate the rotational distance of the head of the bat. For well-hit balls, the rotational distance is 30 inches for most of the swing, then around 34 inches at impact. (Most major league bat are 33.5 - 34 inches.) That red line follows that model. (Click to embiggen)
My interest is with checked swings, with is the other three lines (blue, orange, green). And here we see that the rotational distance quickly gets to 24 inches. When we couple this chart with the one above, we can see that while we can't tell based on speed alone that the batter is checking his swing until 100 msec from impact, the rotational distance tells a very different story: the batter is already preparing to pull his bat back almost from the start of the swing.
It will be interesting as more data comes in if we can more generalize how a batter is reacting.
This seems like this should have an obvious answer, but it does not. What is Swing Speed? Let me count the ways. There are at least FIVE reasonable answers to this question.
You can decide to measure the linear speed at the head (top) of the bat. That seems to give you a constant number. It won't. This is especially true when the rotational point is something other than the knob of the bat, which happens when you lunge for a ball for example.
You can decide to measure the linear speed at the presumed "sweetspot" (about 6 to 7 inches from the top of the bat). This seems even better, since that's (mostly) where the collision happens. It's still not constant though. The rotational point from above applies here.
You can decide to measure the linear speed at the collision (impact) point. This is where the energy transfer actually happens, and so, where the calculations are centered. Which makes good enough sense. Except what do you do for non-collisions, like swing and miss? Even if you decide to use a "default" sweetspot point for swing-and-miss, you won't have an actual collision time, and so, WHEN do you take the speed of the bat? And imagine when you just "barely" touch a ball nowhere near the sweetspot: now you have different places for measurement based on outcome, which is NOT something you want to do.
You can take the angular speed. This would at least seem to be a reasonable baseline, bypassing the need to know the impact point. Think of a record player, where it doesn't matter what kind of disc you put on the turntable, it'll spin at the same (angular) speed, even though the outside of the disc is spinning at a faster (linear) speed (the outside of the disc covers more linear distance than the inside, but both travel the same amount of time). The unit here is radians per second. So, 50 radians per second means... what exactly? It means 8(*) rotations per second (if you try to use the turntable analogy), but what does THAT mean either? Maybe a Bugs Bunny cartoon would like to represent a swing as 8 rotations per second. The swing-and-miss timing issue still applies.
Finally, we can take advantage of the angular speed by establishing a constant point, like 27 inches from the rotational point (I'll call this R27). 27 inches conforms roughly to the sweet spot. A bat is 33.5 to 34.0 inches, and so that puts the sweet spot at 6.5 to 7 inches from the head (assuming the rotational point is the knob). An angular speed of 50 radians/sec is 76.7 mph (**). This at least gives us the units to tie swing speed to launch speed, and at a point that is roughly where the energy transfer is targetted to happen.
(*) A circle is 2 x PI, or about 6.28 radians. 8 rotations is therefore ~50 radians.
(**) 27in / 12in/ft x 3600s/hr / 5280ft/mi = 1.534 mph x sec, and that's the multiplier for angular speed (1/sec) to R27 (in mph)
What is Swing Speed? It really depends on the question. And you can have five possible answers.
There are three reasons for applying positional adjustments. In this particular thread, I will describe one of those three reasons.
Reason 1: The types of opportunities are similar, but the frequency is not
In other words: how do we handle outfielders?
When it comes to catching balls, you have balls hit in front of you, and balls hit behind you. You have balls hit slicing to your left, or hooking to your left. Or slicing to your right, or hooking to your right. Or hit right at you.
When it comes to throwing balls, you will throw to 1B or 2B or 3B or home plate. Or to the cutoff player. Or the relay player.
In other words, whether you are a LF, CF, or RF, you are going to face ALL of these situations. Naturally, you are going to face each of these situations at a different frequency. The reason you put your best fielder in CF is that there are more plays available at CF than in either of the corners. You can thank the fence being closer at the corners for that. The reason you put your stronger arm in RF is that there are longer throws made from RF than in LF. You can thank the direction of running for that.
But, when it comes to the specific skillset needed to play, they are all the same. You need to be able to handle all the situations noted above.
The baseline: single positional average
Now, the comparison point. There are two choices to be made here. The first is the traditional one: let's compare CF to other CF and LF to other LF and RF to other RF. This, by definition, will make the average LF plays "above average" equal to 0. The average RF is also 0. And the average CF is 0 plays above the average CF.
But we just said that the best fielder is in CF. Which also means the average CF is also better than the average corner outfielder.
So, what to do? How can we compare the average CF to the average LF, if both come in a "0" above average. The key is to determine how much value an average fielding CF has compared to an average fielding LF. There are multiple ways to get there. In the end, we've settled on these rough translations:
minus 3 runs in CF
equals
plus 6 runs in LF
equals
plus 6 runs in RF
In other words, if you are a slightly below average fielding CF, you have the same defensive impact as a somewhat above average fielding LF or RF. I think we can all accept that, right? It seems obvious enough that if you have half of the best fielders all in CF, then being slightly below average among the elites would make you above average when compared to the rest of the non-CF.
The baseline: one average
With Statcast OAA (outs above average), we actually have no reason to have three standards, only to then try to figure out how to unify them into one. We start with the one. We start with an average fielding outfielder.
As I mentioned at the start, all the outfielders are doing the same thing: they start some 300 feet away, they are going to run in some direction to a ball hit randomly around them. It's going to take about the same amount of time, the same number of feet. It basically becomes a straightforward process to start with the single standard.
And when we do that, we can see how well each of the three positions does compared to the same single standard. In terms of runs prevented on the range plays (so not including arm), this is how well the average fielder does at each position:
+5 runs CF
-1 runs RF
-4 runs LF
In other words, the average LF is 9 runs behind the average CF. And if we include the arm (which we will in 2022), it might come in as +5, 0, -5 respectively. In other words, there'll be a 10 run gap between LF/CF and a 5 run gap between RF/CF.
Statcast OAA only goes back to 2016, so this method is going to be limited.
Takeaway
No matter how you choose to do it, you need to have some way to compare players across positions. Also note that what is currently true, that the quality of fielders is better at RF than LF, is not necessarily true historically. After all, right now, more balls are hit to RF than LF, because of the propensity to hit flyballs the other way. Therefore, with similar kinds of opportunities in RF and LF, you'll put the better fielding player in RF. Of course, there could be situations where you put the better fielder in LF and that rests entirely on the arm. A player that is a bit worse fielder with a much stronger arm will play RF, not LF.
This is why the positional spectrum established by Statcast OAA for outfielders might not necessarily hold historically. You'd have to do a careful analysis to determine what quality of fielders you find in LF and RF.
But it's an almost certainty that the best fielding outfielders are in CF. It's really a matter of magnitude: how much better? And that's what the job of a saberist is. It's not to find "if" something is true, like Is Platoon Advantage Real, Is Clutch Real, or Are CF Better Fielders? All these things are true. The interesting part is to determine the extent to which these things are true. How much is the effect? How many runs is an average CF worth compared to an average LF or average RF?
And in the current era, it's about 5 runs better than the average RF and 10 runs better than the average LF.
You can do season ranges. Trea Turner for example leads with 593 Bolts (runs of 30+ ft/s).
You can filter pitchers (either to see only pitchers, or include them, or exclude them). Luis Perdomo has the clear lead on speed here. I'd be a bit cautious on the lower-end, since this is technically "applied speed" as Cory Schwartz calls it. If you aren't trying to run hard, then we won't have much data to work with. This isn't problematic for position players, since we have volume there. And because of the universal DH rule, then other than Ohtani (who appears on the non-Pitcher list anyway), this pitcher list will be pretty much frozen.
I chuckle whenever someone brings up the idea that the "times thru the order" effect is "obvious" and "everybody knows about it".
Anyway, last year, I described the Pascal Run Value methodology. I limited it to 2020 and 2021 (through June, which is when I posted that). Not sure why I did that. Anyway, now I expanded to 2018-2021.
And because @enosarris brought up Times Thru Order, I also limited the pitches to the first time thru. His contention is that, possibly, two 95mph pitches aren't really the same, if one is thrown in the 1st inning and another is thrown in the 6th. That maybe there are other characteristics to that 95mph pitch, such as pitcher fatigue. Which is of course an excellent point. So, I limited this study to first time thru.
That said: it didn't really matter. Whether I looked at all pitches, or just 1st time thru, the conclusion was the same when it came to pitch speed. The best performing pitches were those thrown at "average" speed for that pitcher. The worst pitches were both the 10% fastest pitches and the 10% slowest pitches.
To show one example, for 4-seam fastballs, the breakdown looks like this:
+0.29 runs/100p @ 95.1 mph
-0.14 runs/100p @ 94.5 mph
+0.06 runs/100p @ 93.7 mph
+0.15 runs/100p @ 92.9 mph
+0.35 runs/100p @ 92.1 mph
The characteristics of each group was fairly similar otherwise, whether we are talking about break, spinrate, amount of gyro. The pitch location was slightly different, with the very fastest and very slowest least likely to be in the heart of the plate and most likely to be in the waste region. So, the presumption is that you over or under threw a pitch, and so had less control of the pitch. The big takeaway, which is probably well known: be consistent.
Next up: how much vertical break do we want?
I'm going to only focus on 1st-time-thru pitches, just to be clear, just like I did above.
Changeups: no real pattern
Sinkers: definite pattern of less vertical break, the better (aka, the more sink, the better)
4-seamers (or Risers): definite pattern of more vertical break, the better (aka, the more ride, the better)
Cutters: a mild pattern of more vertical break the better
Sliders: no real pattern
Curves: no real pattern
Next up: how much horizontal movement do we want?
Changeups: a definite pattern of the more tail the better
Sinkers: a definite pattern of the more tail the better
Risers: not much pattern other than not too much tail
Cutters: a definite pattern of the more hook the better
Sliders: no pattern
Curves: no pattern other than not too much hook
Next up: how much spinrate do we want?
Changeups: no pattern, other than not too extreme in either direction
Sinkers: not much pattern, other than maybe a bit lower is better
Risers: more spin is better
Cutters: not much pattern, other than maybe a bit lower is better
Sliders: no pattern
Curves: no pattern, other than not too extreme in either direction
I think you can tell a story, a logical story, with each one.
I also looked at Vertical Approach Angle (VAA). Well... how can I say this. Vertical Approach Angle will be tied to Vertical Release Angle (VRA). And so, what is very VERY clear with the data is that you don't want an approach angle that is too gentle or too steep. RELATIVE TO YOUR AVERAGE. All that we are seeing with the data is that an extreme VAA means you had an undesirable VRA. The numbers are simply over the top here. You want consistency.
So, let's set aside the really extreme ones, and let's focus on the next set: if you "err" on your VAA, where do you want to err?
Changeups: don't err. You simply want consistency, by far
Sinkers: don't err. You simply want consistency, by far
Risers: don't err. You simply want consistency, by far
Cutters: err on more gentle approach (-5.7 degrees), not a sharp angle (-7.2 degrees)
Sliders: err on more gentle approach (-7.1 degrees), not a sharp angle (-8.7 degrees)
Curves: err on more gentle approach (-8.7 degrees), not a sharp angle (-10.4 degrees)
Again, for all of them, you definitely do NOT want the 10% most extreme in either direction. It's simply a disaster that indicates you didn't throw it the way you wanted to throw it. Consistency, above all else.
Having described the impetus to Layered Hit Probability, I will now lay out the first layer, Homerun Distances. We start with actual data: for every batted ball, simply counting how often balls are hit at that distance, and the percentage of those batted balls that end up as homeruns. Here's the chart for 2016-2021. We see that balls under 300 feet are never homeruns, and those over 430 feet are always homeruns. In-between, we have a fairly smooth releationship. Somewhere close to 390 feet is the 50/50 point as being equally likely to be a HR as not-HR.
So, what can we do? Well, we take this model, and apply it to all batted ball distances. We therefore have an expected HR model, based on the landing distance (which I will call xHR_land). Since 2016, the league leader is Nolan Arenado with 213 xHR_land (compared to his actual 199 HR that were tracked). In other words, the landing distance of his batted balls largely conforms to his actual number of homeruns. We have a similar story with the number two, Nelson Cruz, whose long flyballs would have presumed 200 homeruns based on their distances, and he actually hit 209.
In third place is our first anomoly: Trevor Story has a tremendous number of long distance flyballs, worth 187 of xHR_land, but he actually hit only 157. In a future Layer, we'll figure out HOW (after all, that's why we are here, trying to understand HOW he hit only 157 homeruns). My presumption is that Story is not a pull hitter, and so, he hits alot of long flyballs to the gap or in dead center. In other words, his long flyballs don't end up as homeruns as much as we'd expect from other batters.
The next layer we can add is the Launch Characteristics, but only the launch angle and speed. Every speed and angle leads to an average distance, with a certain range (standard deviation). So, this is the exciting part: knowing we can associate a speed+angle to a distance, we can then use our above model to determine the HR probability for that speed+angle. Even moreso, because a speed+angle produces a range of distances, we will have a range of HR probability for that speed+angle. We simply average those HR probability to come up with the HR probability for that speed+angle.
Why is this method better than using just speed+angle directly? Because that kind of model needs alot of data, and if it doesn't have it, it will start to grab data from "neighbors". So, a 110 mph at 30 degrees, will include data at 111 mph and 28 degrees, or 107 mph at 32 degrees, and so on. But, the batter actually hit the ball at 110 mph and 30 degrees. There's no reason to take data from neighbors. Furthermore, if you get too large a neighborhood, you end up risking the chance of not bringing in actually comparable data. You can even end up with a situation where the higher the speed, the lower the HR probability. That happens in cases where one combination of speed+angle has to extend to a larger neighborhood than another combination of speed+angle.
But if you focus on distances, then this problem completely goes away. And that's because, as you can see in the above chart: the higher the distance, the higher the HR probability. As it should be. This model will guarantee that outcome. This paradigm shift, by reverse engineering a distance based on the speed+angle, is what makes the Layered Hit Probability implementation possible.
With Nolan Arenado, we end up with only 149 xHR_launch (expected HR using the speed+angle launch characteristics). That is a far far cry from his actual 199 or his xHR_land of 213. In other words, we have a disconnect of 64 homeruns between how he launches the balls and how they land. We can guess that Coors is the cause here, but we'll find out in a future layer. Trevor Story has a similar effect: 143 xHR_launch compared to 187 xHR_land, or a 44 HR disconnect.
JD Martinez has a reverse effect: using the distances based on his speed+angle at launch, we'd expect 203 homeruns (xHR_launch). Instead, using the actual distances, we get 180 homeruns (xHR_land). He is short 23 homeruns, which we assume is because his parks are suppressing distances, or, the spin he imparts on the ball (hooks or slices) are keeping his distances at bay. There may be other reasons, which we'll be looking into. Whatever the reason, we want to end up with his actual 184 tracked homeruns. We'll get there, one layer at a time.
So exactly what are we after with Hit Probability? Let me bring you back a few years, as so many folks have been clamoring to introduce the spray direction into hit probability. My counter was always the same: if David Ortiz and Joey Votto hit a ball to the typical SS spot, for an easy hit for Ortiz and an easy out for Votto, do we give the same hit probability or not? Clearly, the fielding alignment matters. To include the spray direction without also considering the fielding alignment would imply that the spray direction is not only in control of the batter, but that the fielding alignment is irrelevant.
That would stop most folks in their tracks. But still other folks would ask for more from Hit Probability. So, then I ask are we talking about the launch characteristics or the landing characteristics? In other words, do we care about the spin of the ball? And pitch to pitch, not all balls are the same (you ever watch tennis and see the pros discard tennis balls after tennis balls before they serve? tennis balls are as unique as snowflakes too), so now we are ascribing the performance of the particular ball to that batter. But everyone wants the landing to be considered. So, there’s that issue. We have the unique park characteristics, we have the wind and the temperature. We have even the identify of the fielder and his performance on that play. After all, if the fielder takes a bad route, that assures a hit where we’d otherwise presume an out. Can we give an out probability of 100% on a play where the fielder was nowhere to be found?
What is clear to one person is not clear to another person. We all want something different out of Hit Probability. And at one of our many meetings on the subject with a revolving door of different folks chiming in, Savant guru Daren Willman noted, paraphrasing:
If we start to consider everything, then every play will either have a 100% or 0% Hit or Out Probability.
And that was the key for me. That’s what cemented the paradigm shift.
Everyone can see the hit. It’s 100% a hit. The question we want to ask therefore is how is it a hit, why is it a hit? How hard did the batter hit the ball? What was the launch angle? The spray direction, and where were the fielders? How good are those fielders, and how well did they perform on that play? What park was that hit in, and how hot was it, and what’s the elevation and how far was the fence? And how fast is that batter as a runner? So rather than coming up with something rather ambiguous or confusing like a 36% Hit Probability, we can instead ascribe the probability of that hit to each component of the context of the play, at that point in time and space. And the key: make sure it adds up to 100% Hit or 100% Out.
And the user can then choose for themselves what they mean by Hit Probability. They can add up only those components that they are interested in. If you are like me, and want to focus on launch speed and angle, so be it. But if others want to include other components, they can do so as well. When Nolan Arenado hits 200 homeruns since 2016, rather than say he should have hit 150 (or 175 or 190) HR if not for Coors, wouldn’t it be better to establish how much every component contributed to the 200 HR, rather than just one? How much of those 200 HR is a result of his power, or his launch angle, or his spray direction, or his many parks, and so on? How does it all add up? We have 200 actual HR, not 150 (or 175 or 190). We want it to all add up to 200.
During the off-season, I will be describing each layer as I am developing it. The first layer will be explained shortly, focused on HR distances. Then we’ll introduce a new layer, one at a time.
This chart shows run values (per 100 pitches) by the strike zone at plate crossing, limited to 4-seam fastballs, 2018-2021, on 0-1 counts, for RHP.
Each box is 3 inch v 3 inch square. The numbers are “floored”, meaning that “0” means 0 to 2.99 inches, and “3” means 3 to 5.99 inches and so on. (I am also including LHP data, but “mirroring” their data. So technically, all the negative side numbers are on the arm-side, while the positive side numbers are glove-side. For your sanity, just presume RHP.)
So, what do we see? Well, at about 30 inches (2.5 feet) off the ground at close to the center of the plate, run values inside the strike zone are maximized. In other words: run value inside the strike zone peak when they are down the middle. At +/- 12 inches from the center of the plate (so 24 inches wide), we see that pitches still favor the pitcher (even though the plate is 17 inches wide). When batters swing at those pitches that straddle the edge of the strike zone, that’s what happens. Once you go beyond that range, at 15-51 inches off the ground, +/-15 inches to either side of the middle of the plate, it starts to favor the batter. And beyond that, it greatly favors the batter (basically, most of those pitches are called balls).
Question
As I said, that’s at the 0-1 count. What I am interested in is this question: are those run values dependent, at all, on the prior pitch’s location, speed and/or movement? In this case, since I am looking at the 0-1 count, I am now asking about the first pitch strike. Did the kind of pitch thrown for a strike as a first pitch impact how the batter approaches the 0-1 4-seam fastball?
Commit Point
Let’s talk about the decision making region of the batter. The batter does NOT react based on where the pitch crosses the plate. He needs a certain amount of time to react. I’ve nominally set that value as 1/6th of a second (167 msec). Why 1/6th? Well, I looked at a series of checked swings, frame by frame, picking out the “point of no return”, the point at which the batter can no longer safely bail on his swing. At that point, he is committed to swing. And I found that point to be around 1/6th of a second. Interestingly enough, baseball physicist of the 1980s Robert Adair presumed it would be 175 msec. Adair had excellent instincts for his theories, given such limited data available to him.
This is how it looks for a RHP facing Jacob deGrom, trying to decide, at the commit point, whether he sees a 4-seam fastball or a slider. We can see that the trajectory holds very closely (on average), which means that a good deal of the time they intersect. And by the time these pitches reach the plate, they are off by well over a foot.
So, I’m going to do something I’ve never done before, and it’s critical to do it this way for what we are discussing. I’m going to show the run values at the Commit Point. In other words, instead of freezing the pitch at plate crossing, like above, I will instead freeze the pitch 167 msec prior to plate crossing. And that’s because it is at that point that the batter has to make his final decision to bail or continue to commit on his swing. We are taking the snapshot at the last point the batter can make his key observation.
This is how it looks for the 4-seam fastball, on the 0-1 count. (Click to embiggen)
Now we can see the run values by the location of the pitch at the Commit Point. While it LOOKS like the strike zone, it is not. It’s that zone at the Commit Point. We see that the run values favors the pitcher when the pitch is 0 to 15 inches toward the arm-side (where 0 inches is the line from the middle of the plate to the middle of the mound). The ideal height of a 4-seam fastball (on an 0-1 count) is 45 to 60 inches off the ground at the Commit Point. And we can see the more the pitch is away from the ideal zone at the Commit Point, the more the pitch favors the batter.
Ideal Zone at Commit Point
So, from this point onwards, we are going to focus on that Ideal Zone. At the Ideal Zone, we see the run value is roughly minus 1 run per 100 pitches. (The more minus, the more runs are reduced, and so favors the pitcher.) That’s on an 0-1 pitch, for a 4-seamer. This is what the pitcher is targeting if he’s throwing a 4-seamer. Now, we can finally ask the question:
Given that the pitcher wants to throw at this zone, does it matter what the prior pitch was? Does it matter if the prior pitch was a 4-seamer or not? Does it matter how close that first strike pitch was to our current 0-1 pitch in terms of the path it followed? Well, we can finally answer that question.
Markov Prior Pitch Type
So the first thing we’ll look at is see what the prior pitch was, and what the run value is of the 2nd pitch (4-seamer on 0-1 count) that crossed the Ideal Zone.
-1.5 runs, when prior pitch was 4-seamer
-1.1 runs, when prior pitch was Cutter
-0.9 runs, when prior pitch was Sinker
-0.7 runs, when prior pitch was Changeup
-0.3 runs, when prior pitch was Slider
-0.2 runs, when prior pitch was Curve
So, the first interesting finding is if your 4-seamer (on an 0-1 count) is able to cross through the Ideal Zone at the Commit Point, it helps if the prior pitch was a 4-seamer as well. In other words, a 4-seamer first pitch strike followed by a 4-seamer in the Ideal Zone at the Commit Point is what is the most effective. The least effective is the first pitch curve followed by the well-placed 4-seamer.
Markov Prior Pitch Path
Now, what about the actual path of the prior pitch? How close does it need to be to our 0-1 4-seamer in order to be most effective?
Let’s start with the first pitch 4-seamer. When the second pitch is within 3 inches of the first pitch, the run impact is -2.3 runs per 100 pitches, which stands as the best pitch to throw. When the second pitch is between 3 and 9 inches of the first pitch, the run impact is -1.5 to 1.6 runs per 100 pitches. And the more the 2nd pitch deviates from the first pitch, the less effective is that 2nd pitch. In other words: consistency.
I should note that this is at the league level. If there is a bias (and I’ll look for it next time), it would be based on the identity of the pitchers. Until I run that check, everything I’ve said is not definitive (but it is promising). This is the chart for the 4-seamer, based on how much off the trajectory the first pitch is, at the Commit Point:
-2.3 runs: 0 to 2.99 inches
-1.5 runs: 3 to 5.99 inches
-1.6 runs: 6 to 8.99 inches
-0.9 runs: 9 to 11.99 inches
-0.6 runs: 12 to 14.99 inches
-0.2 runs: 15 to 17.99 inches
Now, how about if the 1st pitch was a sinker? In that case, the results were really all over the place. The pattern was up-and-down, thereby suggesting that throwing the 2nd pitch 4-seamer is not dependent on the path of the 1st pitch sinker. But, more work to be done there.
When the 1st pitch is a cutter: it was most effective when the two pitches were within 6 inches of each other, with a run impact of -1.8 runs. So, pairing the cutter-4seamer, along the same path (at the Commit Point) was very effective.
When the 1st pitch is a changeup: the WORST path is when the changeup and 4-seamer shared the same path. In other words, starting with a 1st pitch changeup and then throwing a 2nd pitch 4-seamer, the pitcher does NOT want the two paths to be the same, as this is PLUS 0.2 runs (per 100 pitches). Taking a guess here: the batter is sitting on a 4-seamer, and the pitcher has given the batter a roadmap with the changeup. The batter will be able to jump on the 4-seamer. The most effective 4-seamer on the 2nd pitch, when the 1st pitch is a changeup, is to have a deviation of at least 6 inches.
How about the 1st pitch is a slider? This one is also all over the place. The most effective first pitch slider had a deviation of at least 9 inches, or at most 3 inches. The least effective 2nd pitch 4-seamer is when it deviates from the 1st pitch slider by 3 to 9 inches.
Finally, the 1st pitch curve: results are also all over the place, so no firm conclusions to draw.
Next Step
As I noted, I need to break this down at the individual player level to see how general the trends holds, especially with back-to-back 4-seamers.
And of course, looking at all other plate counts, next starting with the 1-0 count and working our way toward the 3-2 count. There’s 12 plate-counts, so, that means at least 12 blog posts.
On June 21, 2018, Mookie Betts of the Redsox made this easy catch (video) on Joe Mauer at Target Field (with RHP Porcello), whereby Betts did not need to move at all, as the top image shows. Where Betts was standing was where Mauer hit the ball. (Click to embiggen.)
On the bottom left is the location of every RF against Joe Mauer at Target Field, in 2018, against RHP.
That image is blown up into the right image. The three times that Mauer faced the Redsox (all on the same day) are shown in red, with the larger red dot being the play in question. The Catch Probability was 99.5% in all three instances.
The yellow dots are every non-Redsox fielders (against Mauer, Target Field, 2018, RHP). The average Catch Probability of all those fielding alignments was 93%, with that far out blue dot at 10%.
Interlude: you might think the dot below the blue one should be even lower, and… well… you are right, for the RF, it is lower. But in that particular fielding alignment, the CF was closer to where Mauer hit the ball if we were to overlay his batted ball on that fielding alignment; and the CF on that play had a 27% catch probability. Hence, the fielding alignment in that particular case is set to 27% Catch Prob (except for the CF, not RF).
Had the Redsox positioned themselves based on a random non-Redsox alignment, but selected from ACTUAL alignments (against Mauer, Target, 2018, RHP), the expected catch probability is 93%, compared to the actual catch probability of that particular instance of Betts v Mauer of 99.5%. And therefore, we credit the Redsox with +0.065 outs for their Defensive Positioning.
And all we have to do is apply this concept to every single batted ball. In every instance, we are comparing the actual single instance fielding alignment (“with”) to the distribution of instances not involving that fielding team (“without”) against that batter, venue, year, pitch-hand.
As of right now, I only have it for the outfield. And as a necessary condition of doing WOWY (with or without you), this means that we can’t use Mauer on the road, since the road site will always have the same fielding team.(*)
(*) Almost! In rare instances, this is not true. As well, this is not true for traded players mid-season.
tldr: home-to-first, use both; everywhere else, use Sprint Speed
***
Generally speaking, we have alot of problems with the base-to-base times (other than home-to-first).
First, you have the lead: no one actually stands on the base, but rather they are 10-20 feet from the base when batter makes contact.
Secondly, they aren’t running all out until the play is clearly a basehit (or a tag up). So, that plays a role. And even on most base hits, they aren’t running all-out. When they do run all-out, this usually leads to…
Thirdly, we have slides, which of course is meant to slow down players intentionally.
Finally, we have the imprecision of when to stop the clock, because it’s hard to figure out exactly how to do that, especially on slides.
All of those things conspire against getting us really useful base-to-base times (other than home-to-first), unless things really line up for us. On the other hand, finding a one-second window where a runner gets to top speed: that’s really easy. Our sample size explodes in those cases, not only on the basepaths, but also for fielders. We can directly compare Byron Buxton running in the outfield and running on the basepaths. In short, Sprint Speed has broad applications.
So any time you care about any running (other than home-to-first), Sprint Speed is your best bet. And, if you are going to use Sprint Speed everywhere else, you may as well use it home-to-first as well. (Though of course, home to first time should often accompany it in those cases.)
And this is the story of How I met Sprint Speed.
***
If you want to learn more about all the components of speed and running, I highly suggest this great piece courtesy of Travis Petersen, Eddie Elliott, and Daren Willman.
Unlike all the other fielding metrics before Statcast, we know the location of the ball and fielders at all times. What this allows us to do is model reality in a very intuitive manner.
You as a fan can fairly easily judge a fielding play because your eyes can measure distance and time based on your experiences, much like you can figure out when you can and cannot safely cross the street based on your experiences. There is of course that gray area because the "eye test" can only get you so far. You can tell when you can 100% cross and not cross the street safely. But you wouldn't be able to tell for example a 25/75 from a 75/25 situation. But, if you knew the exact distance and speed of the oncoming traffic and you knew how much distance you had to cover and how fast you can cross the street, and had a handy calculator to instantaneously tell you the results, then, yes, you can distinguish between a 25/75 from a 75/25 traffic situation.
So that's where Statcast comes in: we know the exact location of the fielders and the ball, and the time the fielder can get there and retrieve (or miss) the ball. We know how fast the batter can run to first and where he is when the fielder picks up the ball. Basically everything you as a fan are measuring with the eye test in a very intuitive manner we can actually measure and convert into time: will the runner beat the ball or not? This is why Statcast Infield Defense works: it works because we are actually modeling reality.
This chart shows how often an infielder (2B, SS, 3B) makes the play based on how many feet they have to cover. Along the horizontal: Negative is toward 3B and positive toward 1B. Along the vertical: Negative is toward home plate and Positive is behind the fielder. "0" means "0 to 4.99 feet" and so on. The numbers represents the out rate. The green box is the starting point of the infielder.
In this simplified view, I am NOT showing how much time the infielder needs. That's why it looks uneven in some cases. So, you are getting half the view: just measuring distance. And even at that, you can see a pretty strong relationship. Statcast Infield OAA also includes the time the ball gets there, as well as the location and speed of the batter. See the above link for a more complete description.
Recent comments
Older comments
Page 1 of 150 pages 1 2 3 > Last ›Complete Archive – By Category
Complete Archive – By Date
FORUM TOPICS
Jul 12 15:22 MarcelsApr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref
Apr 12 09:43 What if baseball was like survivor? You are eliminated ...
Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method)
Jul 13 10:20 How to watch great past games without spoilers