[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
THE BOOK cover
The Unwritten Book
is Finally Written!

Read Excerpts & Reviews
E-Book available
as Amazon Kindle or
at iTunes for $9.99.

Hardcopy available at Amazon
SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
Shop Amazon & Support This Blog
RECENT FORUM TOPICS
Jul 12 15:22 Marcels
Apr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref

Advanced

Tangotiger Blog

A blog about baseball, hockey, life, and whatever else there is.

Bat_Tracking

Sunday, February 16, 2025

Bat-Tracking: Timing Early/Late

Previously: Timing Along the Bat, Timing Over/Under

One of the challenges of figuring out if a swing is early or late is determination of the presumed point of contact. What is a late contact swing and what is an early contact swing?

A simple approach is to choose a common point, like the front of home plate. But, you can imagine that each batter is going to have a different approach. And so, we discarded that option in favour of it being batter-specific. Even in that case, the batter is moving up and down the batters box, or at least they could. And so, we made sure to make it based on the center of mass for that batter for that swing. Even in THAT case, we cannot just choose a common point, like 30 inches in front of them. This is where EV50 comes to save the day once again. Paul Goldschmidt hits his Best Speed (50% hardest hit, the EV50) balls 20 inches forward from his center of mass, while Justin Turner is closer to 40 inches. And therefore, I am setting the Ideal Timing Point in this manner: every batter (and their batside) has a unique forward point relative to their center of mass.

From here, we then simply ask this question: when the ball and bat cross paths, or came closest to crossing path (what I call the Intercept Point), where is that Intercept Point compared to the Ideal Timing Point (for that batter-batside)? In this way, we can establish if a swing was early, late, or on-time.

With regards to the Over/Under, about two-thirds of the swings were within 2 inches of the centerline of the swing path. Similarly for TiedUp/Flair, a range of +/- 4 inches from the sweetspot ringe gives us two-thirds of the swings.

In order to establish a time range for on-time, any swing where the ball-bat Intercept Point was within 7 millseconds (roughly 10 inches) of the Ideal Timing Point is considered an On-Time Swing. Otherwise, the swing is either Early or Late.

These charts below represent all swings. The first chart is the run value, and we see that the best performing swings are the early swings. And if you think about pulled homeruns, I think this becomes easier to understand.

The second chart shows the frequency of the swings. The median and mode is a slightly late swing, even though the overall average swing is roughly exactly on-time.

Again, we will learn in the comming weeks, on a batter by batter basis, how often their swings are early, late or on-time.

(2) Comments • 2025/02/19 • Bat_Tracking

Bat-Tracking: Timing Over/Under

See part 1, for Timing Along the Bat.

A plane is two-dimensional. A thin sheet of paper is the simplest version of a plane (even though technically a sheet of paper has thickness, but let's not technical precision get in the way of a natural illustration).

When we talk about a swing plane, there's two things that make this not true. The first is that the bat itself has thickness. Whereas we can accept that a single sheet of paper is two-dimensional, a stack of 500 sheets of paper is obviously three-dimensional. A bat is much more like 500 sheets of paper than a single sheet. And so, a swinging bat is obviously not going to trace out a plane, but rather a slab.

Even if the bat itself was two-dimensional, the other is that the bat itself is not going in a circular path, and so tracing out a single plane (a single sheet of paper). Because the wrists and arms of the batter is involved, the bat may look like it's tracing a single plane, but it is in fact going through multiple planes.

As a result, rather than using the inaccurate term of plane, we will use the more descriptive term of path. We will be talking about the swing path (which if you want to think of it like a swing plane, you may do so).

As I mentioned, the swing path is really a slab, a two-by-four if you wish. That's the coverage area the swing is tracing out. As we mentioned in a previous article, the slab that the swing path goes through is roughly +/- 2 inches. If the ball goes over this slab, then this means the bat went under the ball. And we naturally have the opposite: the slab of the swing path going over the ball.

And we see here the data bears this out (click to embiggen). This data is as earlier: we focus on the ideal part of the bat, the +/- 1 inches around the sweetspot, and then look at the results.

When the ball goes flush against this two-by-four slab of a swing path in the sweetspot, 95% of these swings leads to an EV50 (which we established in the previous article that EV50 leads to a best speed batted ball for that batter). And any ball that is more than 3-4 inches from the centerline of the swing path is 100% whiff. And so, we are now in a position to establish for every batter, and every swing of that batter, how often they swing under, swing over, and swing in-slab. (Feel free to suggest a better term. It's ok to say you don't like to say in-slab, but I need an alternative that is better than in-slab. Saying Not-in-slab is not helpful to a baseball fan, though it is the perfect thing to say if you are a politician.)

In terms of performance, we see a slight bias toward hitting under the ball (the right side of the chart is the ball going over the swing path). And this makes sense: slightly hitting over the ball is a grounder; slightly hitting under the ball is an airball. And you have a better chance of good production in the air than on the ground. We can see the most damage happens from around the center of the ball being half-an-inch below the centerline of the swing path to one-inch above the centerline of the swing path.

I can tell you that league-wide, 12% of swings has the swing path go over the ball, while 24% of swings has the swing path go under the ball. And this makes sense, since we don't want to hit the ball exactly flush, but with a slight offset. So naturally, when you miss, you will miss more often by going under the ball than over the ball. We'll learn in the coming weeks how often batters line up the ball in their swing path in-slab, as well as how often they miss over or under.

Bat-Tracking: Timing along the bat

DIMENSIONS

The thickness of a baseball bat around the sweetspot is about 6 cm, which is 2.36 inches. That sets the centerline of the bat as 1.18 inches from the outer edge of the bat.

The diameter of a baseball is about 7.4 cm, which is just over 2.91 inches. That sets the radius of a baseball as 1.46 inches.

So, when the center of the baseball and the centerline of the bat are 1.46 + 1.18 = 2.64 inches apart, this means the bat and ball will graze each other.

In order for the bat to do any kind of damage to the ball, the center of each has to be about a maximum of 2 inches apart.

OBJECTIVE

While there is more to hitting a baseball than how hard you hit it, it's the objective of every batter to square up on a ball as much as possible. One of the ways to measure the performance of a batter is to focus on how hard they hit the ball. To that end, one thing that I like to do is focus on all batted balls, and for each batter (and by bat-side for switch hitters) take their 50% hardest hit balls. The average of these balls with the highest Exit Velocity (EV) we label as EV50, and we can call them Best Speed.

Interlude

One small note for you percentile jocks: when you take the 95th percentile of something, that is very similar to taking the 10% highest of something and figuring the average of those things. Similarly, the 90th percentile value is very similar to the average of the 20% highest. In other words, the single value percentile number is at the midpoint of a distribution. Of course, because we are looking at the tail-end of a distribution, it doesn't work out exactly like that, but you can think of it in that way. So, the EV50 would be equivalent to the 75th percentile. In other words, EV50 is EV75th. So, why use EV50 instead of EV75th? Well, when you take a percentile, you are taking a single value as representing everything around it. Why do people do that? Because it's really really easy to do that. But, if you want to better represent ALL of the data in question, then we should take ALL of the data in question and average that. So, that's why I eschew percentile values and focus on the average of all the points above a line.

EV50

The EV50 is perhaps the most important value to look at for a batter. You can see that value on two leaderboards on Savant: the Exit Velocity leaderboard, and the Custom Leaderboard.

When we look at an EV50 batted ball, we know something that is true: the batter hit the ball pretty close to the sweetspot of the bat. Does the tracking data bear this out?

This is what I did. For each batter, I determined how far in front of them do they hit these EV50 batted balls. For the typical batter, this is about 30 inches forward from their center of mass, with a range of 20 inches (Paul Goldschmidt) to 40 inches (Justin Turner). So, everyone has their preferred timing spot. So, I limit the swings to when contact is made within 7 msec (about 10 inches) from their ideal contact spot. I further limit the swing to balls hit within half-an-inch of the centerline of the bat.

RESULTS

With all the ideal conditions for the swing determined, this leaves us with 4% of all MLB swings; all that is left is to see how often an EV50 swing resulted, based on its proximity to the sweetspot of the bat (click to embiggen):

If you focus at the peak point, that says that 95% of swings that hit within half-an-inch of the centerline of the bat, and on the sweetspot, resulted in an EV50 batted ball. This also suggests that any chance of doing any damage at all has to be within 6 inches of the sweetspot.

One note about the sweetspot and the bat length. The bat length has been fixed at 83 cm, or 32.7 inches. We don't have the each batter's actual bat length available in the tracking system, and so, a nominal value was selected. And the sweetspot was set at 6 inches from the head of this bat. This is why the 100% whiff rate only starts at 10 inches from the sweetspot, which is 4 inches beyond this nominal bat. As the ball radius is 1.46 inches, the 100% whiff rate should be there. However, differing bat lengths and limitation in the precision in the tracking gives us the results we see. While not ideal if this is the year 2035, it is an astounding value to have available in 2025. If you think of where we were in 2015, we're 90% of where we want to be today, which is a huge leap.

Another way to measure the performance of the batter is based on actual outcomes, whether it's a HR, a hit, or a whiff. We do this with Run Values. And for purposes of this article, I'm going to calculate the run values of each swing, relative to the average run value for that batter. In this way, our overall average is zero, and each batter is compared to themselves.

When we do that, we see that a batter has a bit more margin of error when they miss the sweetspot closer to themselves, rather than toward the head of the bat. Focus on the +3 inches of both charts. Here we see that 50% of the swings still result in an EV50. And that overall, they get their own average production. But look at the symmetrical -3 inches: those swings result in an EV50 70% of the time. And as a result, their overall production is still +15 runs per 100 swings.

Indeed, at about -7 inches from the sweetspot (so 13 inches down from the head of the bat), the batter gets the same production as at +3 inches. Even though at -7 inches, almost none of the batter's swings are EV50. Mishits that lead to singles are still singles. While not something you want in a batter, it does show that when you mishit, you want to mishit closer to the label rather than mishit closer to the head.

NEXT TIME

There are three dimensions in timing a swing. We have this one I just showed, in terms of trying to line the ball up to the sweetspot of the bat. In other words, a spectrum from Flail / SweetspotRing / TiedUp

The next two I will present in a future article:

Monday, December 23, 2024

Swing Speed and Acceleration Curves

One thing I do is classify every batted ball as whether it is a Best-Speed or not. Best-Speed is simply the 50% swings with the highest launch speed. The reason we do this is because we are often interested in swings when good things happen. As for the number of swings chosen, the top 50% corresponds most closely with a batter's talent.

A batter will also not have the same swing each time even limited to these best-speed swings. Acceleration peaks at about 50 msec prior to impact, but not always. The range player to player is wide. And even swing to swing for the same player, there is variation. So, I also flag the best-speed swings where he reaches his personal most common acceleration point.

What I will now show you are the average of the 38 optimal swings for Giancarlo, frame by frame. At 300 frames per second, this means we are tracking 3 frames for every 10 milliseconds. So, on the x-axis, where you see -3, that means it is 10 msec prior to impact. The chart starts at -30, or 100 msec prior to impact.

Click to embiggen

On the left y-axis, that's the swing speed, which peaks at almost 84 mph (solid line). This is essentially close to the human limit.

On the right y-axis, that's the acceleration, in the form of change in speed, frame to frame. I am showing it as a dotted line. We see that Giancarlo has a peak acceleration at close to 15 frames (50 msec) prior to impact. Since we are looking at the best-of-best, we can also surmise that this is exactly what Giancarlo wants to do.

Math interlude: We can see he gains just over 3 mph (3.2) per frame, or 3.2 mph per 3.33 msec of acceleration. 3.2 mph is 4.7 feet per second. Therefore 4.7 ft/s per 3.33 msec is 1410 feet per second-squared. Gravity is 32.174 feet per second-squared, or one g-force. Giancarlo is 1410 / 32.174 = 44. That's how much acceleration Giancarlo generates with his swings: 44 g-force.

All that is fun, but let's look even more. We can also classify all his whiffs (swing and misses) by whether they are early or late. So, here's the same chart, but with the blue lines for his late swings. We can see by the curves that he is late by about 3 frames (10 msec).

And now the green lines to add in his early swings, which you can see is about 15 msec.

And if you want to see a bit of fun, here are all his ontime, early and late swings, but shifting his whiffs forward/backward.

What we will learn in the coming years is the Aging Curves of early/late swings.  As batters lose their swing speed, are they able to adapt.  Or, are they simply obstinate and keep trying the same thing over and over again, and the late swings will overtake the early swings more and more as they age.  So much to still learn...

Saturday, December 14, 2024

Should a batter have a steeper or flatter swing (part 2)?

Yesterday, I went thru the process of identifying the steepness of the swing path, the speed of the swing, and the location of the pitch, with a focus on low pitches.

Today, I will look at high pitches. I made a few updates/improvements to that process. First, I created 4 groupings of each of these three parameters. And each of the individual groupings have 25% of the data. I also limited it to the 329 batters with the most swings (technically, it's the batter+batside), so roughly 11 batters per team. So, each of the 64 bins has around 3700 swings, all perfectly proportioned. Ohtani for example has 19 or 20 swings in each of these 64 bins. The minimum number of swings is 64 x 6 for any batter (so min 6 swings in any bin for a total of at least 384 swings).

I will show you the whiff rate, along with the squared up rate (given that the ball was contacted).

(Click to embiggen)

For pitches that are high, the whiff rate is maximized on the slower/flatter swing. Which may be surprising. However, given that the ball is contacted, the squared up rate is maximized with a slower/flatter swing.

What I like to do is look at overall production, through Run Values. A new thing I've been using recently is deleveraged Run Values. Whatever the run value is for a swing, I remove the base-out leverage aspect to it so that every plate appearance is equally weighted. I should probably write about this. This does keep it in-line with how Fangraphs is showing run values.

The key number to remember is that a whiff (swing and miss) is worth around minus 6 runs per 100 pitches. It is possible to be worse than that, since an easy batted ball out is worth minus 27 runs per 100 swings. This is why it is possible that a whiff is preferred to putting a ball in play. If that ball in play is an out 67% of the time (minus 27 runs per 100 swings) and a non-HR hit 33% of the time (plus 54 runs), then that gives us an average of 0 runs relative to average. An out 75% of the time means the run value is -6.75 runs per 100 swings. The breakeven point is a .260 BABIP. Given the league average BABIP is .290 or so, you can see how a pitch off the plate is preferred to being whiffed altogether (unless there are two strikes naturally) or fouled off to being put into play.

Anyway, here's the results for the run values for all 64 bins.

First, we get the uninteresting results that the Faster swings produce more value than the Mid-Fast, which produce more value than the Mid-Slow which produce more value than the Slower swings in every comparison (except one). As I noted yesterday, there's a cause-effect relationship here. If a batter has committed to his faster swing, chances are everything is lined up for him in terms of timing. So, he has little reason to slow or alter his swing.

We also get the uninteresting result that the 25% highest pitches produce less value than the next 25% highest pitches (the mid-high). And the 25% lowest pitches produce less value than the next 25% lowest (the mid-low). Those extreme pitches are essentially out of the strike zone. Again, everything makes sense directionally.

STEEP V FLAT

But, the fun part is the Vertical Swing Path, comparing a batter who goes Steeper versus going Flatter. And the way we've laid it out, we've controlled for the identity of the batter, the location of the pitch, and the swing speed (essentially decision/timing) of the batter.

With the higher pitches, among the slow/mid-slow swings, the steep/flat swings show no difference. These are pitches that the batter should not have swung at, which we can tell by the high location and slower swing speed. And whether they tried to go flat or steep, it didn't have any effect.

Only among the Faster swing do we notice what we expect: a Flatter swing on those high pitches is preferred to a Steeper swing. Directionally it makes sense. And in terms of Magnitude, that's 0.5 runs per 100 swings of benefit for the Flatter swing on high pitches that the batter commits to.

If we focus on the Lower pitches, there we get again the expected result: the Steeper swing is better than the Flatter swing. Not just better, but far far far better. Why a batter would try to keep his swing Flat on a pitch that is low that he is committed to swing at is pretty confusing. And that run value of -5 runs per 100 swings is essentially equivalent to a whiff.

Indeed, even on a pitch that is Mid-Low, the Flat swing is very very very poor in comparison to any other swing, even a mid-Flat swing.

Across the board, we can pretty much come to the conclusion that a Steeper swing is preferred to a Flatter swing (except for those higher pitches). And remember, this is controlling for the identity of the batter. On average, the batter's 25% Steeper swings, compared to that batter's 25% Flatter swings generates more production. The run values per 100 swings, controlling for the swing speed and location and identity of batter:

  • -1.4 runs: 25% STEEPER
  • -1.3 runs: 25% MID-STEEP
  • -1.5 runs: 25% MID-FLAT
  • -3.0 runs: 25% FLATTER
(1) Comments • 2024/12/18 • Bat_Tracking

Friday, December 13, 2024

Should a batter have a steeper or flatter swing?

There is alot to untangle in swings, since there's not a natural cause-effect relationship. A batter doesn't choose to swing at 75mph. He starts off at zero naturally, and it builds up to some point, and the speed is maximized at impact. So, there's a timing component there. A batter can slow his swing down if he mistimes by swinging too early. Then there's the location of the pitch, where if the batter doesn't abandon his swing, he's likely swinging with a steeper angle on low pitches, while high pitches probably requires the batter to go flatter. All to say, the batter doesn't choose a single set of parameters, and just swings. There's a cause-effect-cause-effect relationship throughout the swing.

That said, let's see what we can do here to untangle some of that. First, I'll tell you what I did.

THE STUDY

For every batter, I identify the height of the ball, swing by swing. The 30% highest of those pitches, I put in the HIGHER group. The 30% lowest of those pitches, I put in the LOWER group. The remaining 40% are in the MID group. This ensures that regardless of the height of the batter, be it Altuve or Judge, each of the batters will have 30% of their swings in the HIGHER group, and so on. We have proportionate representation. No bias.

For each of those groups of swings, for each batter, I create five speeds of swings:

  • 15% FASTEST
  • 20% FASTER
  • 30% MID
  • 20% SLOWER
  • 15% SLOWEST

Again, we have proportionate representation. I take Giancarlo's 15% fastest swings (in the HIGHER group) and put those in the FASTEST bucket. And same for the 15% of his fastest swings (in the LOWER group) and also put those in the FASTEST bucket. Again, we have perfectly proportionate representation, with no batter biasing any of the groupings.

And I repeat that for the Vertical Swing Path. His 15% steepest swings (from the HIGHER / FASTER group) I put in the STEEPEST group. I repeat all that similarly:

  • 15% STEEPEST
  • 20% STEEPER
  • 30% MID
  • 20% FLATTER
  • 15% FLATTEST

So, we have three groups of swings, each properly proportionate at the batter level, so there is no bias.  Altogether, we will have 75 possible combinations.  But some of those barely register for number of swings, and so we end up with 63 combinations of categories with any meaningful data.

A QUESTION

Now, we can start asking questions. The first question I will ask is this: Given that we have a pitch in the LOWER group (meaning the 30% lowest pitches faced, batter by batter), how is the whiff rate affected by the speed of the swing and the vertical swing path?

We can certainly make a pretty good guess without seeing the results. On pitches that are on the LOWER side, I'd expect the FLATTEST swings to produce the most whiffs. After all, just to get to those balls, you are going to want to have a golf-like swing to make contact. So, STEEPEST swings fewer whiffs, FLATTEST swings more whiffs. While we know directionally what will happen, what we do not know, until we see the data, is the magnitude of this effect. Most of the truths in our saber work really is about magnitude, rather than direction.

And how about swing speed? How do we think the swing speed affects the whiff rate on pitches in the LOWER group? I think there you can make a decent argument either way. If you are swinging your FASTEST at a pitch that is low, chances are you have confidence in that swing. At the same time, if you are swinging SLOWEST at a low pitch, chances are you may have recognized in time and so are just looking to make contact.

So, what do we actually see here (click to embiggen)? First, we do not even have enough swings to show for the FLATTEST group. That's a pretty good sign that batters will avoid flattening their swings on low pitches. And we do see that whiff rates are highest with the FLATTER group, and lowest with the STEEP groups. The magnitude is quite striking, some 10% to 20% lower whiff rate the steeper the swing. We didn't know that, now we do.

As for swing speed: faster swings indeed lead to fewer whiffs, if you have a steep swing. However, if you have a flatter swing, a slower swing leads to fewer whiffs. The cause-effect here is unclear. On low pitches, you are probably slowing your swing and just making contact, and so you might have a flatter swing. But if you are confident in your swing, you would go for the fast-steep swing. That said, the differences aren't that striking when it comes to swing speed among the low pitches.

Next time, we'll look at pitches in the HIGHER group to see what we can learn

Sunday, June 02, 2024

Stanton Swing Speed and Acceleration Curves

I took all of Stanton's swings with a launch speed of 95+ and determined when he reached his maximum acceleration.  In his case, he reached max-accel from 7 frames prior to contact back to 13 frames prior to contact.  At 300 frames per second, that translates to 23 to 43 msec prior to contact.  But I'll just talk about frames here. You can see that basically all his swings are the same, and they are just either early or late.  If you shift each curve they'd basically all overlap. (click to embiggen)

Here is how those swings look like based on the swing speed.  That red-swing, the early-swing, when he reaches max accel early, his swing speed reaches its max at 6 frames prior to contact (20 msec), and essentially stays there for the duration.  As to whether any of this is good or bad, well, we'd have to see his performance for each of these seven groups of swings.  At this very moment, I have no idea.  But, I'll do that in the comments later today.

UPDATE: Here are the matching images for Arraez

(1) Comments • 2024/06/02 • Bat_Tracking

When does a batter reach his maximum acceleration in his swing?

I'll show you two charts, both very similar.  (click to embiggen)

  • The first is looking only at batted balls that were hit 400+ feet.  As the average HR is about 400 feet, we're essentially treating these as the perfect hits.  
  • The second takes all the swings for each batter's 50% fastest exit velocity.  This ensures proportionate representation (as opposed to the above which is biased toward batters who can hit it deep).

Either way, this shows that the acceleration is maximized from 65 msec prior to the impact time to 25 msec prior.  The swing speed at these two points is about 30-35 mph at the start to about 65 mph at the end.

This is sortof the reason in my prior article that I was using 0 to 30 for the initial acceleration and 30 to 60 for the main acceleration.  While we may be tempted to change that to something like 0-35 and 35-65, this will throw out all of those swings where the batter did not even reach 65mph.  Take a look at Arraez for example.  Even limiting swings that reached 60mph will remove a decent portion of his swings.  At 65mph we'll be removing most of his swings.

In any case, we can report all the acceleration values.  The 0-30mph, 30-60mph, as well as the point for each swing where acceleration was maximized.

Here is how Arraez and Giancarlo Stanton look in terms of their acceleration.  Stanton reaches his peak acceleration much earlier than Arraez (about 10 msec).  Though they both start the ramp up at around the same level of speed (29 mph for Arraez,  36 mph for Stanton), Stanton gets to a much higher level (70 mph) than Arraez (57 mph) in the same amount of elapsed time (40 msec).

If you are trying to imagine what 40 msec represents: a typical fastball will reach home plate in about 400 msec.  So, the acceleration phase of the swing is about one-tenth the time it takes for the ball to each home plate from the pitcher's hand.

Saturday, June 01, 2024

The five ways to measure the start and/or stop point of a swing

Suppose we decide that the start time of a swing is when the pitcher releases the ball. That seems a natural point to choose. A pitch however can be thrown from 105 mph all the way down to 35 mph. Even if you take a less exaggerated range, we're still talking about a pitch that will reach the front of home plate between 375 msec and 525 msec. That's a range of 150 msec, which basically (almost) allows a batter to check his swing and restart his swing! Clearly, only focusing on a common distance (~53 feet of release) is not going to work.

How about we focus on a common time, say 250 msec from plate crossing? So, regardless of the speed of the pitch, we're saying the batter's swing is dependent on the same amount of time. However, choosing the plate crossing presumes that all batters are trying to make contact at the plate crossing. But batters stand at different parts of the box, facing different sided pitchers with pitches thrown at different trajectories that will reach home plate inside or outside or up or down. Not to mention the ball-strike count affects expectations as well.

What if we take the actual point of intercept (meaning the actual impact point on contacted balls or the point where the ball and bat are closest for whiffs)? This presumes the batter know what the actual point of intercept will end up being. Working backwards from a known quantity comes with its own issues.

Finally, what is the right way? Well, the closest we can come is try to determine an expected intercept point. Using the identity of the pitcher, the identity of the batter, the ball-strike count, and the batter's location in the batter's box, we can try to predict where the intercept point will end up being for any particular pitch.

Can we come close to the right way in a simple manner? On an aggregate level, the pitcher and ball-strike count will not matter much for any particular batter. So, we can establish a batter's intercept point by looking at all his swings, as LHH or RHH, over the course of the season.

So, in terms of which method to use, I would suggest the most preferred to least preferred is this:

  1. Predict the intercept point by using the variables in play for that particular pitch
  2. Presume the intercept point by using that batter's seasonal average
  3. Treat the actual intercept point on that pitch as the presumed intercept point
  4. Use the front of plate as the intercept point
  5. Use the pitcher release time plus some constant as the intercept point

Friday, May 31, 2024

Statcast Lab: Pre-introducting Bat Acceleration

Acceleration is a tough nut to crack. Not so much in terms of finding the acceleration curve, which you can see here as an example (and its derivative, the confusingly named jerk). But rather, in how to present acceleration.

For a pitch, it's quite straightforward: once the ball is released, the ball is, essentially, traveling at a constant deceleration. Not exactly but close enough. That allows us to create metrics quite easily from it, like the break of a pitch.

A bat is different, because it is constantly increasing in speed, and doing so at a non-constant rate. In other words, the acceleration is constantly going up... until it invariably starts to be reduced (though still positive). The tangent of the acceleration, the jerk, is somewhat constant. But by that point, most people are not going to understand what that even means.

So, we turn to runners. We can think of out-of-the-block, we can think of burst, we can think of cruising speed. We can also turn our attention to cars, going 0 to 60 in X.Y number of seconds.

Here's ONE approach. Suppose the first critical speed point is 30 mph. Maybe it's 40 mph, since that's closer to the max speed of a successful checked swing. But, let's go with 30 mph for now. Ideally, a batter is going to take as long as possible before they even get to 30 mph and rely on their acceleration. Or maybe, ideally a batter is going to want to get to 30 mph as fast as possible because they don't have the acceleration. Or, well, who knows right now. Let's take a look at the data. (Click to embiggen)

I always look for Giancarlo Stanton first, so I can understand what I am seeing it. And there he is, in 2023 and 2024, with Jason Heyward. This is what I call the Lambo Swing: they immediately ramp up to 30mph as fast as they can, and they go 30 to 60 as fast as anyone. In the case of Stanton, he just keeps going to 80+.

The next name I look for is Luis Arraez. And there he is, a Slow and Steady Swing: takes his time to get to 30 mph, and then a slow acceleration to 60 mph. And that's pretty close to his final speed.

In the top left quadrant is the Kokomo Swing: they get to 30 mph as fast as possible, but are pretty slow to 60 mph. Altuve, JD Martinez, Arenado are all there. Maybe it's batters that are just getting old, and so, are relying on their experience to start their swing as early as possible, because they don't have the acceleration to sustain it? We'll see.

Finally, the bottom right quadrant is the Pants on Fire Swing. They start their swing as late as possible and then explode 30mph to 60mph as quick as possible. Jo Adell, Corbin Carroll are representative here. So, this is probably what I think batters are after, taking as much time as possible to size up the pitch, then rely on their explosiveness to get to 60mph as fast as they can.

Are there other ways to describe acceleration? Sure. Instead of measuring elapsed time between two fixed speeds, we can instead measure change in speed between two fixed timestamps. For example, maybe we look for the change in speed from 70 msec prior to the intercept point to 40 msec prior.

Or, instead of two fixed points in time, maybe it's any 30 msec window where we can find the maximum increase in speed.

We'll try different methods to see what we can learn.

(3) Comments • 2024/06/01 • Bat_Tracking

Friday, May 10, 2024

Statcast Lab: Pinnacle of Sabermetrics merges Performance Analysis and Scouting

Theo Epstein had a great line some 15 or 20 years ago, paraphrasing: in order to see better, he needs glasses with one lens focused on performance analysis and the other lens on scouting. He needed both to see clearly.

One of the things that scouting entails (among many other things) is focused on tools: how fast someone runs, or throws, or in the case of batters, swings.  (Again for you speed readers: I'm only talking about one small facet of scouting.) Most of us are focused on the end results (in the form of say wOBA), and then reverse-engineering, or inferring, what that means.  Someone hits 50 HR, we infer they have a high bat speed.  Someone hits 0 HR but has a .350 batting average, and we infer they have a low bat speed, but they square up the ball alot.  50 HR with 200 strikeouts and maybe they swing hard all the time.  50 HR with 50 strikeouts and maybe they swing hard and make great contact.

Instead of inferring what the batter might be doing that leads to those results, we can now use a new data point: bat speed.  We no longer need to know if they swing hard or not.  We now know.  And by this time next year, we will know if their year to year bat speed went up or down.

For this little study, I'm going to presume that the batter's bat speed applies to his whole career (since 2015).  I am going to use three data points: wOBA, xwOBA, and bat speed.  I will be correlating each to next season's target, which is wOBA.  It's important to note that wOBA includes walks and strikeouts.  

First off, xwOBA does the best, with a correlation of r=0.446.  wOBA comes next at r=0.407, while bat speed comes in last at r=0.224.  On the one hand: that seems low.  On the other hand: that seems high.  After all, wOBA and xwOBA uses the combination of everything the batter does (his swing, his approach, his results, etc), and so, we'd expect them to correlate well with next season's wOBA.  But bat speed is just... bat speed.  To just be given that number, and get to an r=0.224 is actually pretty impressive.

What really matters though is if bat speed gives us EXTRA information, beyond what we already know in wOBA and xwOBA.  First, when we use both wOBA and xwOBA, our correlation goes up to r=0.450.  Remember, we got 0.446 with just xwOBA on its own.  Including wOBA barely moves us forward.  In other words, xwOBA, which focuses on launch speed and angle already does a great job in describing the batter, that we don't really need their result in the form of wOBA.

See, what happens is that xwOBA removes a layer from wOBA: it removes all the parks and fielders and Random Variation that comes with that.  Most of that is really noise and so, adding wOBA to xwOBA doesn't really help us.  We just needed xwOBA.

Now, what about bat speed?  What if we look at xwOBA and bat speed?  Well, in that case, our correlation goes to r=0.455. That is higher than xwOBA + wOBA.  That's right, given the choice between xwOBA and wOBA, or between xwOBA and bat speed, it's the latter that is preferred.  (Insofar that this little test suggests.)

Remember, think of it in terms of layers.  One layer removed from wOBA is xwOBA.  Then, one layer removed from xwOBA is bat speed.  Bat speed leads to launch speed, which is the key ingredient of xwOBA.  And xwOBA leads to wOBA.  The more layers you peel back, the more you get to the core of the batter themselves.

And to finish off this little study: wOBA and bat speed gives us an r=0.429, which is even less than xwOBA on its own.

All three gives us an r=0.460

One word about xwOBA: it is a descriptive metric, not predictive.  If I wanted to make it predictive, I would have done so.  I would have given a POSITIVE weight to a high launch speed, high launch angle popout.  In reality, xwOBA, a descriptive metric, gives this a very NEGATIVE value.  As it should.  But, as a PREDICTIVE metric, this would get a very positive value.  Why?  Because hitting a 100 mph, 60 degree popup takes ALOT of power.  It's a sign that the batter has... high bat speed.  That's the inference we can make.  Of course, now that we have bat speed, we no longer need to make that inference.

Next time, I will look at see if how much a batter squares up on the ball does to predict wOBA.  I don't know the answer yet.

Wednesday, May 01, 2024

Statcast Lab: Switch Hitters and Swing Speed

(Click to embiggen)

The x-axis shows the difference in swing speed for switch hitters.  Players on the far right, like Jose Ramirez, swing much harder as a RHH than LHH.  Players on the far left, like EDLC swing harder as a LHH than RHH.

The y-axis shows the difference in wOBA, translated to Runs per 700 PA.  Players on top, like Robbie Grossman, perform much better as a RHH.  Players on bottom, like EDLC perform much (much much) better as a LHH.

In the red box are players with reverse-splits: they perform batter from one side, though swing harder on the other side.  As you can see, these are unusual players.  Robbie Grossman hits much better as a RHH, even though he swings harder as a LHH.  

In the blue box are players with matching-splits and have extreme gaps in swing speeds: EDLC for example performs far far better as a LHH.  And, not coincidentally, he swings harder as a LHH.  As you can see, there are many more switch hitters who perform both much better as RHH and swing harder as a RHH.  The players in the blue box are candidates to stop switch hitting. 

Batters in the middle across have a gap in the swing speeds, but no gap in performance.  They may have figured out how to compensate their game.  Tommy Edman is on the cusp here.  He swings far harder as a RHH, and just has a modestly higher performance as a RHH.

Batters in the center down have a gap in performance, but no gap in swing speed.  If there is a reason that Ozzie Albies performs much better as a RHH, it's not tied to his swing speed as LHH and RHH.

(7) Comments • 2024/05/28 • Bat_Tracking

Friday, February 16, 2024

Statcast Lab: Do some batters overswing?

On his 30% weakest swings, LHH Luis Garcia (Nationals) generated 2 runs per 100 swings above average.  On his 30% hardest swings, he generated 7 runs per 100 swings below average.  He led MLB in terms of that gap in performance.  Can we say he overswings?  I don't know, we'd have to look at each of his swings to see why the results came out as they did.  But he clearly performed better when his swings were the weakest.

On the flip side are batters who far far exceeded their performance on their hardest swings compared to their weakest swings.  Among this group are Ohtani and Yordan Alvarez, who are each around 13 runs above average on their hardest swings and 4 runs below average on their weakest swings.  (League average is +0.5 and -5.0 runs per 100 swings, respectively.)

Of course, you have to be careful here, since a batter is going to potentially check his swing (unsuccessfully), and so the swing speed is not necessarily some sort of independent variable to his approach.

Click to embiggen.


UPDATE: Here is the distribution in speed, as well as the run values, for Garcia and Ohtani. Obviously, Ohtani is in blue. At 81+ is when Ohtani is doing the damage. Garcia you can see had some success at under 68. However, given the combo of 67+68 is a net negative, it may very well be that that is just before-the-fact cherry-picking. That said, Garcia at 74+ or 76+ is a net negative, and it may very well be that he overswings.

(2) Comments • 2024/02/16 • Bat_Tracking

Thursday, February 15, 2024

Statcast Lab: Swing Speed Distributions by Pitch Types

(click to embiggen)

(1) Comments • 2024/02/16 • Bat_Tracking

Friday, January 19, 2024

Statcast: How credible are swing speeds for batters?

A typical batter will have about 1.85 swings per plate appearance, of which 90% are competitive swings (excluding half-swings and failed checked swings, etc). At 600 plate appearances, that comes out to 1000 competitive swings. Suppose you take a random sample of 100 swings? How representative of their true swing speed would that be? As you can imagine, it would be incredibly high. Now, what about 50 swings? 20? 10? What is the credibility level?

What I did was very straightforward: I took 100 random swings for a batter, and correlated to 100 other random swings for that batter. I did that for every batter with at least 200 swings. The correlation came in at r=0.98.

I ran this with 99 swings (for batters with at least 198 swings) and 98 and on and on down to 1 swing (min 2 total swings). Correlation at r=0.95 happened at only 33 swings. Correlation at r=0.90 happened at only 17 swings. Correlation at r=0.80 happened at only 7 swings.

Here's how the chart looks for every point from 1 to 100 swings (those are the blue dots). Click to embiggen.

The orange line is the regression amount, the ballast, the amount of league average swings to add. For you Bayesians out there: that's the prior amount you'd add to the Beta Distribution. As you can see, this number hovers at just under 2 swings. In other words, after 2 swings, the average swing speed of the batter in question is half-real.  We can therefore say the Credibility Level is just under 2 swings.

The dotted line is the Reliability Level: swings / (swings + 1.8). While not as credible as pitch speed, swing speed is not far off.

(3) Comments • 2024/01/22 • Bat_Tracking

Wednesday, December 27, 2023

Is Spencer Torkelson confident, or over-confident, in his swing?

And does Altuve abandon his swing too often?  

I don't know.  But to help us get us there, we can look at how often a batter has a full swing, at each plate location and ball-strike count (click to embiggen).  The first set of numbers is the league average. I (for now anyway) define a full swing as follows: take a batter's 50% fastest swings, take that average, subtract 10 mph, and that's the minimum threshold of swing speed for a full swing.  Anything below that is an abbreviated swing.  League average is about 10%.

The second set of numbers is Spencer Torkelson, who at 95% of his swing as full swings is among the league leaders.  That he is also among league leaders in strikeouts is not a coincidence.  The last set of numbers is Jose Altuve, who at 80% of his swings as full swings is among the league lows.  That he is among the league-lows in strikeouts is also not a coincidence.  Also note that he reserves his abbreviated swings especially in 2-strike counts, to a much larger degree than league average.

Tuesday, December 26, 2023

Are batters confident or over-confident on ball-strike counts that favour the batter?

Look at this chart. You will notice that batters, when a pitch is in The Heart of the Plate, have the slowest swing speed at 0-2 counts (70.1 mph) and fastest swing speed at 3-0 counts (74.4 mph).  Indeed, at EVERY count, the more balls, the higher the speed, and the more strikes the lower the speed.  Roughly speaking, every ball, the speed increases by 0.5 mph, and every strike, the speed decreases by 1 mph.  That's for The Heart of the Plate.

This directional progression (though not the same magnitude) is maintained when the pitch is in The Shadow Zone as well as the Chase Region.  It's only in the Waste Region where the ball-strike count does not matter.

While this progression makes sense in the Heart of the Plate, it makes no sense in the Chase Region.  At this point, the pitch is at least a few inches off the plate.  At a 3-0 count, there's no (good) reason for the swing speed to be at 71.9 mph, while it is 64.2 mph at 0-2.  This is a good sign that the batter is being overly aggressive at 3-0 in the Chase Region.

We can learn more by looking at the Run Values by location and count.  Focus on the Swing columns, and start with Heart of the Plate.  Swinging at 0-2 is providing far more benefit than swinging at 3-0, when the pitch is the Heart of the Plate.  Even though the batter is swinging less hard.  Indeed, if you follow the progression, it is almost a complete reverse of the speed progression: the more strikes, the better the batter is doing on swings, while the more balls, the worse the batter is doing.  

My initial guess is that swinging at 0-2 at a pitch in the Heart of the Plate has the batter with a more defensive swing, hence the lower speed.  And at 3-0, the batter is more aggressive, not worrying about any swing-and-miss, since the worst case is getting them at 3-1.  However, overall, this is not working out.

Naturally, not all batters are going to behave the same way.  I am sure if we look at the best and smartest batters, like Juan Soto and Luis Arraez for example, we'll likely learn what the more optimal approach should be.

What I'd like to learn is if this batting approach ability is something that can be taught, or is it something that pitchers will exploit in a batter early on, and thereby doom that batter to a shorter career.  So much to learn...

Monday, December 25, 2023

Swing Speed, By Plate Location and Count

One of the very first things I did with Statcast data was break up the plate location into zones, beyond just in/out. Humans have a terrific grasp of nuances, and so, we should lean on those nuances. Instead, too often (much too often), we think in terms of binary terms or worse, we categorize things in binary terms.  But rarely are things binary. 

Take the strike zone.  There's a difference between a pitch thrown in the heart of the plate, and another one that is just inside the edge of the strike zone.  The batter, pitcher, catcher, umpire all respond to that nuance.  And so, to simply say "in the strike zone" loses that flavour.  And so, I split up that strike zone into Heart of Plate and Shadow-In.  Even pitches outside the strike zone should be separated.  There's a difference between a pitch just outside the strike zone, and one that is way outside.  For pitches in Shadow-Out, the batter is just as likely to swing as to take a pitch.  There's a Waste region where the batter is rarely fooled, and so will rarely swing.  And between the two is the Chase region, a region where good batter can lay off a pitch, while a bad batter will swing much too often.

With the forthcoming (do not ask me when) data on swing speeds, we can actually track the behaviour of the batter: how fast do they swing based both on the plate location and count?  Well, here it is (click to embiggen):

(2) Comments • 2023/12/26 • Bat_Tracking

Saturday, December 23, 2023

Swing Speed: Arraez v Acuna

Acuna has a swing speed of 77.4 mph, one of the fastest in the league.  Arraez has a swing speed of 63.8, one of the slowest in the league.  When we limit each of their swings to their personal 90% fastest swings (meaning we drop their 10% slowest), here is how their distributions stack up (click to embiggen).

As you can see, their shapes are similar, but just shifted over by 13-14 mph.  Notice that around 67-73 mph they overlap: Arraez at his fastest 20-25% of swings is Acuna at his 20-25% slowest of swings.  

Now, look what happens when we show the run production by swing speed:

Arraez is overall -4 runs on swings.  But at 67+, he is a healthy +7 runs (and naturally -11 runs below 67 mph).  Acuna on the other hand is at his worst at under 76 mph, -6 runs, while he is a superlative +16 runs at 76+.

As you can see, Arraez at 67-73 and Acuna at 67-73 is totally different.  Arraez at his top speed means he did everything he wanted to do, while Acuna at his low speed means that there's an indication of something going wrong.  That's why you can't just look at swing speed on its own: it really needs to be evaluated based on that batter's swing distribution.

More to come...

(5) Comments • 2023/12/25 • Bat_Tracking

Friday, September 22, 2023

Bat Swing Checklist

Sharing some links from 2023 and 2024 as it relates to bat swings:

(2) Comments • 2024/06/13 • Bat_Tracking
Page 1 of 2 pages  1 2 > 

Latest...

COMMENTS

Feb 19 11:05
Bat-Tracking: Timing Early/Late

Feb 07 15:38
Aging Curve - Swing Speed

Feb 06 11:55
Batting Average as a proxy for fun!  Batting Average as a proxy for fun?

Feb 03 20:21
Valuation implication of straying from the .300 win% replacement level

Jan 31 13:35
Breaking into the Sports Industry WITHOUT learning to code

Jan 26 16:27
Statcast: Update to Catcher Framing

Jan 19 15:02
Young players don’t like the MLB pay scale, while veteran stars love it

Jan 14 23:32
Statcast Lab: Distance/Time Model to Catcher Throwing Out Runners

Jan 07 13:54
How can you measure pitch speed by counting frames?

Jan 02 17:43
Run Value with runners on base v bases empty

Dec 28 13:56
Run Values of Pitches: Final v Intermediate

Dec 27 13:56
Hall of Fame voting structure problem

Dec 23 19:24
What does Andre Pallante know about the platoon disadvantage that everyone else does not?

Dec 21 14:02
Run Values by Movement and Arm Angles

Dec 18 20:45
Should a batter have a steeper or flatter swing (part 2)?

THREADS

February 16, 2025
Bat-Tracking: Timing Early/Late

February 16, 2025
Bat-Tracking: Timing Over/Under

February 16, 2025
Bat-Tracking: Timing along the bat

December 23, 2024
Swing Speed and Acceleration Curves

December 14, 2024
Should a batter have a steeper or flatter swing (part 2)?

December 13, 2024
Should a batter have a steeper or flatter swing?

June 02, 2024
Stanton Swing Speed and Acceleration Curves

June 02, 2024
When does a batter reach his maximum acceleration in his swing?

June 01, 2024
The five ways to measure the start and/or stop point of a swing

May 31, 2024
Statcast Lab: Pre-introducting Bat Acceleration

May 10, 2024
Statcast Lab: Pinnacle of Sabermetrics merges Performance Analysis and Scouting

May 01, 2024
Statcast Lab: Switch Hitters and Swing Speed

February 16, 2024
Statcast Lab: Do some batters overswing?

February 15, 2024
Statcast Lab: Swing Speed Distributions by Pitch Types

January 19, 2024
Statcast: How credible are swing speeds for batters?