[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
THE BOOK cover
The Unwritten Book
is Finally Written!

Read Excerpts & Reviews
E-Book available
as Amazon Kindle or
at iTunes for $9.99.

Hardcopy available at Amazon
SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
Shop Amazon & Support This Blog
RECENT FORUM TOPICS
Jul 12 15:22 Marcels
Apr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref

Advanced

Tangotiger Blog

A blog about baseball, hockey, life, and whatever else there is.

Ball_Tracking

Tuesday, June 04, 2024

Statcast Lab: Vertical Swing Angles

(Click to embiggen)

The above shows three different angles, all related to this HR by Adolis Garcia.

Let's start with the blue line on top.  That is what we call the Vertical Bat Angle.  We compare the position of the head of the bat to the position of the handle of the bat. If the head is above the handle, then it has a positive vertical angle.  The head below the handle has a negative vertical angle.  Naturally, when the head and handle are both parallel to the ground, then the vertical angle is zero.  If you watch the video, you can see that the bat is parallel to the ground a little bit before contact and a little bit after contact. 

The green line in the middle is the Vertical Attack Angle.  Whereas the blue line measures bat position in 2D space, the green line measures bat velocity in 2D.  In other words, the green line measures the direction of the bat.  You can see that at the point of impact, the bat has its velocity moving in an upward direction. 

Finally, the orange line is the Vertical Path Angle, the Swing Plane.  Once the bat approaches the intercept point, the bat is essentially moving in a single plane.  If you can imagine a (tilted by 30 degrees) sheet of paper, the bat is passing through that sheet of paper, and it does so from about 30 msec prior to the Intercept Point, and onwards beyond the intercept point.

I know that all this is not very obvious.  The analogy I make is to consider a golf swing.  The Vertical Path Angle, the Swing Plane, approaches closer to 90 degrees (maybe it's 70, I don't know, someone out there can tell us).  The Vertical Attack Angle is similar to a baseball swing, eventually going to a small positive angle.  And naturally the Vertical Golf Angle starts off at a huge positive angle on the backswing, down to a huge negative angle (approaching that 70 or 90 degree angle of that Swing Plane), and continuing back on a huge positive angle.

Anyway, I hope some of that made sense. We'll make it make more sense next time.

Wednesday, March 20, 2024

Statcast: Update to Catcher Framing

We made an update in process, with a big payoff at the pitch-level, with an overall modest impact to the catcher framing.  The current method broke up the regions over the plate into 5 regions, with the prominent one being the Shadow-In (80% called strike rate) and Shadow-Out (80% called ball rate), with adjustments for pitcher and venue.  The new method updates the Shadow Zone process so it is a continuous probability from 0 to 100%, using the specific plate location, with adjustments for bat-side and pitch-hand.  Statcast Data Whiz Taylor did the bulk of the work here.

At the aggregated seasonal level, you won't see much difference.  Current Savant and Steamer at Fangraphs, for 2023, have a correlation of r=0.94.  This will increase to r=0.98 with the new model.  The current Savant process would apply adjustments at the aggregated level.  We did this because we never thought that we'd need to show the strike probability on a pitch by pitch basis.  And since Catcher Framing was one of the very first metrics we created, it languished in this regard.  But thanks to Taylor and their team, a process was built to apply adjustments at each pitch.  By doing that, it will allow us to slice/dice the data the way we do with other data, like Catcher Blocking and Throwing, etc.

Here is how the binned data (100 bins) looks like, comparing the predicted strike rate with the actual called rate. (click to embiggen)

(1) Comments • 2025/01/26 • Ball_Tracking

Monday, January 15, 2024

Statcast: Location of Catcher influences the called strike rate

Umpires are human. Catchers are human. Humans respond to stimulus.

The typical kinds of stimulus is light, heat, physical exertion. Everyone responds different because everyone is different. The most important thing to remember when you apply sabermetrics is that people are human.

How do people respond to taking a snapshot of a 90 mile an hour 3-inch moving object? Exactly, everyone will be different. A pitcher will throw a ball with speed and movement. A batter will move a certain way before taking a pitch. The catcher will catch that ball a certain way. And an umpire, faced with all this stimuli, and using the batter's stance as a frame of reference, along with the home plate, will then make a judgement call as to whether this ball was thru the strike zone or not. This is hard to do.

Now, let's look at RHH facing RHP, at 4-seam fastballs that end up outside, two to three feet off the ground. I select that height so that the focus will be purely on side-to-side. At that height, it doesn't matter if it's Altuve or Judge batting.

For pitches that end up outside, I create three different regions of outside.

  • The first is pitches that are on the outside part of the plate. So, still a strike, but just barely. These are pitches that are 8 inches from the center, plus/minus 1.5 inches.
  • The second is pitches that is just outside of there, enough for part of the ball to maybe catch part of the strike. These are pitches 11 inches from the center of the plate.
  • Finally, the third set is pitches just outside of that, and so are 14 inches, plus/minus 1.5 inches.

Got all that? Just put three balls stacked next to each other, starting with the outside part of the plate, and continuing going out from there. These balls are at 8, 11, and 14 inches from the center.  A ball is almost 3 inches wide.

Does where the catcher position himself when catching an outside pitch matter, in terms of getting the called strike? (click to embiggen)

First, let's start with the easy one, the pitches that are 14 inches from the center, and should be 0% called strike rate. In reality, they are called strike 5% of the time. Indeed, whether the catcher is located on the inside or outside part of the plate is irrelevant: the pitch is far outside enough that it doesn't matter. It's a called strike 5% of the time.  This is the blue line above.

Let's take the almost easy one, pitches that are over the outer part of the plate. When the catcher is positioned on the outside half of the plate, those pitches are called strike 98% of the time. It should be 100%, but 98% is pretty good.  This is the red line above.

However. However, if the catcher is located inside, the called strike rate goes below 90%. This is those pitches that the catcher was expecting to be inside, the pitch goes outside, and the catcher has to dart out to catch. It looks ugly. And over 10% of the time, this is enough stimulus to fool the umpire. Umpires are human, just like you. 10% of the time they are wrong, and that's with years of experience. You'd be wrong even more.

Finally, let's look at those 50/50 pitches, the green line. This is where it matters the most. A catcher located on the outer half of the plate, and they get a pitch that is otherwise a 50/50 pitch will be called a strike 65% of the time. But a catcher located inside will instead only get a called strike 40% of the time. Remember, same location either way, but the catcher being outside will get the call 65% of the time, while being inside will get the call 40% of the time. Humans responding to stimulus. This is the result.

Statcast: How to calculate the Vertical Approach Angle (VAA)

The angle is based on the ratio of the components of the velocity vector. You start by taking the z component (the up/down), and divide it by the remaining components (side to side and front-back). For the remaining components (x, y), you simply apply pythag: sum the squares, then take the square root. After you have that ratio, you take the arctan to give you the angle (in radians, which you can then convert into degrees).

Let's walk through an example. Suppose you have these velocity components (x,y,z): 4, -125, -20. The units won't matter, since they will cancel out, but if you must know, these are in ft/s. You start by combining the x,y components. The pythag of 4 and -125 is 125.06. As you can tell, virtually all (but NOT ALL) of the velocity is in the y direction, which is from mound to home. Naturally, there will be some amount of speed that is side-to-side, but when you are talking about pythag, a triangle, you can see the hypotenuse and the longest side are virtually the same.

Next, we take the downward value, -20 and divide by 125.06. That gives us -0.16. That's the ratio of the vertical velocity compared to the horizontal velocity. This number is naturally always going to be small.

Next, take the arctan of this small number, which gives you... a similar small number of -0.159. Again, when you are dealing with small numbers, the ratio of the sides and the angle (in radians) are going to be very very similar. They are basically both approaching zero. The closer the ratio is to 0, the closer the angle becomes 0.

The final step is to turn it into human numbers, converting radians into degrees. You do that by multiplying by 180/PI(), or 57.296. Fun fact: some software (I'm looking right at you BigQuery) doesn't have a PI() function, so you can use acos(-1), which is PI(). There's gotta by ONE BigQuery developer out in Google land that also likes sabermetrics. So, if you want to make me happy, just create a PI() function please. Anyway, -.159 in radians is -9 degrees.

I bring all this up because Eli noted that instead of calculating the VAA at plate crossing, we should be calculating, some angle, when it actually correlates the most with the thing we are interested in, which in this case is whiff rate.

He ends up concluding that we should take the angle at 13 feet behind home plate. Now, why would that be? I think it's simply a question of variation, which is really what correlations are about. Here's an image from Alan Nathan's calculator that we extend to 20 feet beyond home plate, and 3 feet underground. The scales are obviously exaggerated. That red line, the release angle, is actually just minus 6 degrees. If I were to extend a tangent line starting at plate crossing, it'll be minus 9 degrees in this image (the VAA). If you look at the range in the release angles in reality, and compare it to the range in approach angles (at plate crossing), you will see the range is about 1.4X at plate crossing. So, wider variation, the more you take the timestamp away from release point. You can thank gravity for that. (Though when I look only at 4-seamers, the variation is greater at release.)

I think this is what Eli is capturing, some combination of speed and release angle, but I may be wrong. The next step really is to focus on common speeds, say look at all fastballs thrown at 93-94 for example. My expectation is that it wouldn't matter where you are going to measure the angle.

Read More

(5) Comments • 2024/01/16 • Ball_Tracking

Tuesday, July 18, 2023

Drag values: standard deviation v mean

Back before the start of the 2022 season, we released the Drag Dashboard. Its calculation relied heavily on work from Alan Nathan and David Kagan. Our focus is on four-seam fastballs. Recently, we included a chart showing the full distribution of pitches year by year going back to 2016. (Click to embiggen)

There are two principal reasons that the range in the drag calculation is so wide. The first is that each ball is unique like a snowflake. In other words, there isn't "one" ball being used, but hundreds of thousands of balls: "This intra-season variability in drag is attributable, in part, to the fact significant parts of the baseball are constructed by hand." The other is simply the uncertainty in the calculation of the drag coefficient itself. That said, this uncertainty should be quite similar year to year, and so looking at the year to year variability in the drag calculation should be telling.(Click to embiggen)

One look at the chart and we are struck with two key takeaways. The first is how stable we are in the 2020-2023 time period compared to the 2016-2019 time period. The other is how stable intra-season we are in the more recent seasons, especially 2023.

So, let's look at the data and see what it actually shows. Here are the results of the entire distribution of pitches, year by year. So, it's simply the standard deviation of all pitches:

year num_pitches sd_of_cd avg_of_cd

2023 134557 0.0250 0.341

2022 235136 0.0270 0.347

2021 251084 0.0258 0.341

2020 91111 0.0257 0.341

2019 262088 0.0237 0.328

2018 252198 0.0248 0.337

2017 248753 0.0253 0.335

2016 253524 0.0273 0.346

It's not a surprise that the lowest standard deviation is also in the year that has the lowest mean, and the two highest standard deviations are the two highest means. Here's how it looks when we plot the standard deviation against the mean. (click to embiggen)

Here we see that the spread in the drag values in 2023 is the lowest relative to our expected given the other seasons.

Next is to see how it looks day to day (minimum 200 fastballs per day). And this data matches what you are seeing in the plot on the site:

year num_days avg_pitches_per_day sd_of_mean_cd

2023 106 1269 0.0037

2022 179 1314 0.0044

2021 182 1380 0.0050

2020 66 1378 0.0049

2019 182 1439 0.0040

2018 182 1384 0.0043

2017 179 1390 0.0057

2016 179 1416 0.0051

The day to day consistency is much tighter in 2023 than it has been in any of the other seasons.

It is also worth pointing out that the tracking systems changed in 2020. So it's possible that some of the "break" you see between 2019 and 2020 might be related to the different tracking system. And so, you might consider only looking at 2020-2023 for that reason. When you do so, 2023 stands out even more in terms of consistency.

(3) Comments • 2023/08/16 • Ball_Tracking

Monday, January 16, 2023

Keeping the signature of the ball in the Dark Side of the Moon

In talking about his four-seam grip, Robbie Ray told @pitchingNinja (video) that he holds the ball with the signature side away from the batter. The idea is that he'd rather the batter see the white, or light side of the ball, and keep the signature out of sight, keeping it in the Dark Side of the Moon, er Ball. He reasoned that allowing the batter to see markings on the ball could only benefit the batter.

Is he right? Based on what I see: no. Or at least, he's neither right nor wrong.

Let's go through the research. Statcast is able to track the point on the ball that it spins around. And Robbie Ray, a LHP, when he throws to a RHP, does indeed keep the signature in the Dark Side. Of his 1010 to RHH, he has 96% spinning with the signature in the Dark Side, invisible to the batter. The question is if it matters.

And for that, we need an experiment: finding pitchers who don't care where the signature is when they spin their fastballs. And among LHP, we have 50 of them who seem to just place the signature randomly. Carlos Rodon for example threw 751 pitches with the signature in the (hidden) Dark Side, and 673 in the (visible) Light Side. With Rodon specifically, he pitched MUCH better in the Dark Side, essentially the point that Robbie Ray was making. This is true for the next pitcher with the most number of pitches thrown, Julio Urias. Indeed, for all six pitchers who threw at least 800 pitches, split about evenly between Dark Side and Light Side, each of them performed better in the Dark Side. Then again, the next six highest thrown pitches after that did better in the Light Side, going directly against Robbie Ray's thesis. The overall average of these 50 pitchers does slightly favor the Dark Side, but the median is on the Light Side. Basically, we're back to square one.

Weirdly, there are 14 LHP who throw predominantly on the Light Side. Since there's no good reason to expose the signature to the batter, this must be a superstition / comfort thing. Would it make sense for pitchers who have already chosen a preference to make them switch sides, or at least randomly choose sides? Probably not.

Now, how about RHP against LHH? Gerrit Cole is a pitcher with no preference, randomly choosing sides. And when he puts the signature on the Dark Side, he does indeed get better results. Cristian Javier however is the opposite, getting much better performance on the Light Side. Spencer Strider does better Dark Side, while Alek Manoah goes Light Side. In the end, RHP actually perform a bit better with the signature on the (visible) Light Side, both as a mean and median. This goes against Robbie Ray's thesis.

Is it possible that this is an individual pitcher thing, that for some pitchers going Dark Side is better than Light Side, and vice versa? I suppose that's always possible. Ideally, no one will tell Gerrit Cole and Carlos Rodon and the others, and they'll continue going 50/50 on their pitches, as they provide the ideal experiment: They are randomly choosing sides, and they are unaware they are doing so. Everyone keep quiet for one more year please.

Tuesday, November 22, 2022

Seam Orientation Update

We've updated the animation of the pitcher pages to now use their actual(*) seam orientation, rather than the default seam orientation.

For example, we noted a while ago that Felix Bautista throws a 2-seam fastball.  This would almost always make us call it a sinker.  However, it behaves like a 4-seamer.  So the old animation would show a 4-seam orientation.  Instead, now it accurately shows a 2-seam orientation.

Corey Kluber throws a Kluberball.  If you call it a curveball, then we'd show a 4-seam orientation.  We do in fact call it a curveball, but it actually is closer to a 2-seam orientation.  Now you can see that.  

Some pitchers throw 2-seam changeups, and other 4-seam changeups.  Now we show it accurately.  (Well, except for pitchers who throw BOTH, in which case, we choose the one that is most popular.)

(*) The actual that I am referring to, right now, is based on the longitude of the seam orientation. This works fine for most fastballs, curves, changeups, cutters. It does NOT work for sliders, which really needs the latitude as well. All in due time. Right now, our animation just lets us do one dimension, but we've got the second dimension in the pipeline. So someone like Stroman (not that there's anyone like Stroman!) will have his sinker with a pure 2-seam orientation, when it's really more of a 1 seam as he calls it.

Anyway, coupled with the pre-existing spin axis, what you see is the actual spin of the ball, both on the spin axis and the seam orientation. 

Sunday, November 07, 2021

Statcast Lab: Markov Sequences, 4-seamers on 0-1 counts

(Click to embiggen)

This chart shows run values (per 100 pitches) by the strike zone at plate crossing, limited to 4-seam fastballs, 2018-2021, on 0-1 counts, for RHP.

Each box is 3 inch v 3 inch square. The numbers are “floored”, meaning that “0” means 0 to 2.99 inches, and “3” means 3 to 5.99 inches and so on. (I am also including LHP data, but “mirroring” their data. So technically, all the negative side numbers are on the arm-side, while the positive side numbers are glove-side. For your sanity, just presume RHP.)

So, what do we see? Well, at about 30 inches (2.5 feet) off the ground at close to the center of the plate, run values inside the strike zone are maximized. In other words: run value inside the strike zone peak when they are down the middle. At +/- 12 inches from the center of the plate (so 24 inches wide), we see that pitches still favor the pitcher (even though the plate is 17 inches wide). When batters swing at those pitches that straddle the edge of the strike zone, that’s what happens. Once you go beyond that range, at 15-51 inches off the ground, +/-15 inches to either side of the middle of the plate, it starts to favor the batter. And beyond that, it greatly favors the batter (basically, most of those pitches are called balls).

Question

As I said, that’s at the 0-1 count. What I am interested in is this question: are those run values dependent, at all, on the prior pitch’s location, speed and/or movement? In this case, since I am looking at the 0-1 count, I am now asking about the first pitch strike. Did the kind of pitch thrown for a strike as a first pitch impact how the batter approaches the 0-1 4-seam fastball?

Commit Point

Let’s talk about the decision making region of the batter. The batter does NOT react based on where the pitch crosses the plate. He needs a certain amount of time to react. I’ve nominally set that value as 1/6th of a second (167 msec). Why 1/6th? Well, I looked at a series of checked swings, frame by frame, picking out the “point of no return”, the point at which the batter can no longer safely bail on his swing. At that point, he is committed to swing. And I found that point to be around 1/6th of a second. Interestingly enough, baseball physicist of the 1980s Robert Adair presumed it would be 175 msec. Adair had excellent instincts for his theories, given such limited data available to him.

This is how it looks for a RHP facing Jacob deGrom, trying to decide, at the commit point, whether he sees a 4-seam fastball or a slider. We can see that the trajectory holds very closely (on average), which means that a good deal of the time they intersect. And by the time these pitches reach the plate, they are off by well over a foot.

So, I’m going to do something I’ve never done before, and it’s critical to do it this way for what we are discussing. I’m going to show the run values at the Commit Point. In other words, instead of freezing the pitch at plate crossing, like above, I will instead freeze the pitch 167 msec prior to plate crossing. And that’s because it is at that point that the batter has to make his final decision to bail or continue to commit on his swing. We are taking the snapshot at the last point the batter can make his key observation.

This is how it looks for the 4-seam fastball, on the 0-1 count. (Click to embiggen)

Now we can see the run values by the location of the pitch at the Commit Point. While it LOOKS like the strike zone, it is not. It’s that zone at the Commit Point. We see that the run values favors the pitcher when the pitch is 0 to 15 inches toward the arm-side (where 0 inches is the line from the middle of the plate to the middle of the mound). The ideal height of a 4-seam fastball (on an 0-1 count) is 45 to 60 inches off the ground at the Commit Point. And we can see the more the pitch is away from the ideal zone at the Commit Point, the more the pitch favors the batter.

Ideal Zone at Commit Point

So, from this point onwards, we are going to focus on that Ideal Zone. At the Ideal Zone, we see the run value is roughly minus 1 run per 100 pitches. (The more minus, the more runs are reduced, and so favors the pitcher.) That’s on an 0-1 pitch, for a 4-seamer. This is what the pitcher is targeting if he’s throwing a 4-seamer. Now, we can finally ask the question:

Given that the pitcher wants to throw at this zone, does it matter what the prior pitch was? Does it matter if the prior pitch was a 4-seamer or not? Does it matter how close that first strike pitch was to our current 0-1 pitch in terms of the path it followed? Well, we can finally answer that question.

Markov Prior Pitch Type

So the first thing we’ll look at is see what the prior pitch was, and what the run value is of the 2nd pitch (4-seamer on 0-1 count) that crossed the Ideal Zone.

  • -1.5 runs, when prior pitch was 4-seamer
  • -1.1 runs, when prior pitch was Cutter
  • -0.9 runs, when prior pitch was Sinker
  • -0.7 runs, when prior pitch was Changeup
  • -0.3 runs, when prior pitch was Slider
  • -0.2 runs, when prior pitch was Curve

So, the first interesting finding is if your 4-seamer (on an 0-1 count) is able to cross through the Ideal Zone at the Commit Point, it helps if the prior pitch was a 4-seamer as well. In other words, a 4-seamer first pitch strike followed by a 4-seamer in the Ideal Zone at the Commit Point is what is the most effective. The least effective is the first pitch curve followed by the well-placed 4-seamer.

Markov Prior Pitch Path

Now, what about the actual path of the prior pitch? How close does it need to be to our 0-1 4-seamer in order to be most effective?

Let’s start with the first pitch 4-seamer. When the second pitch is within 3 inches of the first pitch, the run impact is -2.3 runs per 100 pitches, which stands as the best pitch to throw. When the second pitch is between 3 and 9 inches of the first pitch, the run impact is -1.5 to 1.6 runs per 100 pitches. And the more the 2nd pitch deviates from the first pitch, the less effective is that 2nd pitch. In other words: consistency.

I should note that this is at the league level. If there is a bias (and I’ll look for it next time), it would be based on the identity of the pitchers. Until I run that check, everything I’ve said is not definitive (but it is promising). This is the chart for the 4-seamer, based on how much off the trajectory the first pitch is, at the Commit Point:

  • -2.3 runs: 0 to 2.99 inches
  • -1.5 runs: 3 to 5.99 inches
  • -1.6 runs: 6 to 8.99 inches
  • -0.9 runs: 9 to 11.99 inches
  • -0.6 runs: 12 to 14.99 inches
  • -0.2 runs: 15 to 17.99 inches

Now, how about if the 1st pitch was a sinker? In that case, the results were really all over the place. The pattern was up-and-down, thereby suggesting that throwing the 2nd pitch 4-seamer is not dependent on the path of the 1st pitch sinker. But, more work to be done there.

When the 1st pitch is a cutter: it was most effective when the two pitches were within 6 inches of each other, with a run impact of -1.8 runs. So, pairing the cutter-4seamer, along the same path (at the Commit Point) was very effective.

When the 1st pitch is a changeup: the WORST path is when the changeup and 4-seamer shared the same path. In other words, starting with a 1st pitch changeup and then throwing a 2nd pitch 4-seamer, the pitcher does NOT want the two paths to be the same, as this is PLUS 0.2 runs (per 100 pitches). Taking a guess here: the batter is sitting on a 4-seamer, and the pitcher has given the batter a roadmap with the changeup. The batter will be able to jump on the 4-seamer. The most effective 4-seamer on the 2nd pitch, when the 1st pitch is a changeup, is to have a deviation of at least 6 inches.

How about the 1st pitch is a slider? This one is also all over the place. The most effective first pitch slider had a deviation of at least 9 inches, or at most 3 inches. The least effective 2nd pitch 4-seamer is when it deviates from the 1st pitch slider by 3 to 9 inches.

Finally, the 1st pitch curve: results are also all over the place, so no firm conclusions to draw.

Next Step

As I noted, I need to break this down at the individual player level to see how general the trends holds, especially with back-to-back 4-seamers.

And of course, looking at all other plate counts, next starting with the 1-0 count and working our way toward the 3-2 count. There’s 12 plate-counts, so, that means at least 12 blog posts.

Friday, June 04, 2021

Pascal’s Run Values

A couple of years ago, I introduced a technique to analyze data in a powerful yet simple manner. I'm sure some form or other has been around long before I used it.  Indeed, I was inspired by its use when I first saw former Statcast intern Tess Kolp use something like it.

One of the methods I use often is to group data based on the values.  All 99.5 to 100.5 mph batted balls go into the 100 mph bucket.  The 100.5 to 101.5 goes into the 101 mph bucket and so on.  Simple enough and works quite well.  When it doesn't work well is if you group by temperature as one example.  Games at 40 F and 105 F are few and far between, while those at 70 to 75 are overwhelmingly large.  Once you bin the data, if you don't track the weight (number of games, PA, etc) that make up each bucket, then you will let those few games at 40 F carry the same weight as those at 70 F.

What you can do instead is create a percentile bucket: order all the games from warmest to coldest, then put the 1% warmest game in one bucket, the next 1% warmest games in another bucket and so on. (Thank you Tess!)  This way, you have 100 buckets, equally weighted (same number of games). This also works quite well.  Where it becomes a mess is close to the median (50%) percentile point: the difference in temperature for the 49th and 50th and 51st bucket are going to be extremely close.  Any up/down results you see around that 50th percentile mark will be subject to, essentially, total Random Variation.  

The technique I introduced is a merge of these two bucketing approaches.  I still order the data based on percentiles.  But now, instead of creating 100 buckets of 1% of the data each, I instead create 5 buckets, where the two most extreme buckets each contain 10% of the data.  The middle bucket contains 40% of the data.  The two remaining bucket contains 20% of the data each.  It looks like this: 10%, 20%, 40%, 20%, 10%.

Fans of Pascal's triangle may see a glimmer in there.  The 4th order row of Pascal's triangle has weights of 1, 4, 6, 4, 1.  If we convert those to percentages, that comes to: 6%, 25%, 38%, 25%, 6%.  As you can see, my bucketing is somewhat in-line with Pascal's triangle.

There's some advantages in using the discrete values I'm using, not the least of which is simplicity of description.  You also get to use a lower threshold for your samples.  With a 10/20/40/20/10 scheme, if you have 10 pitches, the fastest pitch goes into one bucket, the slowest into another, the 4 middle speed into the middle and so on.  If I used Pascal's directly, I'd need 16+ pitches to make it work.  Anyway, I like it simple and 10/20/40/20/10 appeals to me.

As a nod to Pascal's triangle, and since I need to give a name to what I'm about to do, I'll call these Pascal's Run Values.  Let's get on with it.

***

I'm going to focus solely on 2020+2021 data, including post-season.  What I do is for every pitcher who threw at least 10 4-seam fastballs (henceforth called Risers) in a game gets split into a 10/20/40/20/10 distribution from fastest to slowest.  By doing it game by game, I neatly bypass any park and temperature effects. That bias goes away.  This is probably the most important part of this technique, that I can control for this bias.  I do this for each pitcher, each game.  Then I simply add things up at the group level.  For Group 1, the 10% fastest pitches of each pitcher of each of their games, the average speed is 95.1mph.  Group 2 is 94.4mph, followed by 93.6, 92.7, 91.8.  In other words, each group's pitches is about 0.8mph higher than the next.

If a pitcher throws a pitcher faster than his usual that day, does he have better performance?  The answer is: yes.  Insofar as Risers are concerned, a pitcher reduces runs at a rate of 0.17 runs per 100 pitches in the fastest (Group 1) pitches.  The next fastest, Group 2, reduced runs at a very similar 0.19 runs per 100 pitches.  The middle group, the average fastball speed for that pitcher that day, was a drop of 0.05 runs per 100 pitches.  The next group was at +0.17 runs of increase per 100 pitches.  And the last group, Group 5, which represents the 10% slowest Risers thrown by each pitcher on those days, was +0.19 runs of increase.

That's alot of words for what is essentially numbers.  Let me list it for you:

Risers

  • -0.17: Group 1, 95.1 mph
  • -0.19: Group 2, 94.4 mph
  • -0.05: Group 3, 93.6 mph
  • +0.17: Group 4, 92.7 mph
  • +0.19: Group 5, 91.8 mph

Remember, for pitchers, reducing runs is what they are after, and so, a minus sign is good for the pitcher, while a plus sign is good for the batter.

***

I'll now go through each of the other pitch types.

Sinkers

  • +0.20: Group 1, 94.1 mph
  • -0.15: Group 2, 93.5 mph
  • -0.15: Group 3, 92.7 mph
  • -0.10: Group 4, 91.9 mph
  • +0.70: Group 5, 91.1 mph

Sinkers have a clear pattern: don't throw it the very hardest, and definitely don't throw it too slow. When I look to see what is happening with the slow-sinkers, two patterns emerge: they don't break as much, and they are thrown in the Waste Region more than normal.  In other words, a slow sinker is a sign of a mistake pitch (to some extent anyway).

***

Changeups

  • +0.22: Group 1, 86.0 mph
  • -0.50: Group 2, 85.2 mph
  • -0.76: Group 3, 84.4 mph
  • -0.58: Group 4, 83.4 mph
  • +0.20: Group 5, 82.5 mph

This is a similar story as with the sinker: don't throw your changeup too hard or too slow.  And a similar story emerged: those pitches had too much or too little break, and ended up in the Waste Region more than usual. Really, it's about knowing what your pitches can do, since you are going for location.  Throw the pitcher harder or slower than your normal, or with more or less break than your normal, then the pitch will go in an unintended location.

And as you can see by the above data, you want your pitches within +/-1 mph of your usual, and definitely not outside of +/-2 mph of your usual.  At least as we've seen with Sinkers and Changeups.  With Risers, it's more, faster, better, stronger.

***

Cutters

  • +0.62: Group 1, 90.0 mph
  • -0.63: Group 2, 89.2 mph
  • -0.09: Group 3, 88.2 mph
  • -0.50: Group 4, 87.2 mph
  • +0.32: Group 5, 86.2 mph

More of the same.  A fast cutter is esssentially a slow 4-seamer, and those are bad.  When I look at the pitch location of hard cutters, they are poorly thrown.  So again, same deal as above: too-fast and too-slow cutters are indications of mistake pitches.

***

Sliders

  • -0.36: Group 1, 86.7 mph
  • -1.24: Group 2, 85.8 mph
  • -0.78: Group 3, 84.9 mph
  • -0.61: Group 4, 83.8 mph
  • -0.15: Group 5, 82.8 mph

Second verse, same as the first.  Hard sliders end up in the Waste Region a tremendous number of times, double the normal rate. Again, a sign of a mistake pitch when thrown too hard.  Still, Sliders are tremendously valuable, as even a bad slider is a good pitch.  Of course, it's possible that only good pitchers can throw a slider to begin with, or has the repertoire to balance out the slider.  But that's another story.  For this story, it's the same as the others: consistency is the key.

***

Curveballs

  • -0.15: Group 1, 81.2 mph
  • -0.63: Group 2, 80.4 mph
  • -0.84: Group 3, 79.3 mph
  • -0.37: Group 4, 78.3 mph
  • +0.69: Group 5, 77.2 mph

How many verses are we up to?  Once again, you want your curveballs to be thrown at a consistent speed.  Not too fast, and certainly not too slow.  

***

If we equally weight all our non-4-seamers, we get this:

Simple average, non-4-seamers

  • +0.11: Group 1
  • -0.63: Group 2
  • -0.53: Group 3
  • -0.43: Group 4
  • +0.36: Group 5

It is an unmistakable pattern.  You want to be consistent, meaning around 1 mph of your personal norm, with slightly faster better than slightly slower.  You definitely don't want to go too fast, and certainly not too slow, as those are likely indicators that you are not in full control of the pitch.  Except for 4-seam fastballs: those you want to throw as hard as you can.

***

Next?  I will look at horizontal break, vertical break, total break, and... something I have never done before: Active Spin (or its complement, Gyro Spin).  In other words, how much Gyro do you want?  Is it better to have too much or not enough?  Or is consistency the key?  I have no idea (yet).  But we'll learn the answers to that together, next time. 

(8) Comments • 2021/06/14 • Ball_Tracking

Sunday, January 24, 2021

Statcast Lab: Diverging Fastballs and the Seam-Shifted Wake

On the Spin Leaderboards on Savant​, you can see charts like above for every pitcher (above is batter point-of-view). LHP Max Fried has one of the best fastballs in baseball. In the chart on the left (Spin-based movement), you can see that both his 4-seamer, and his sinker, are thrown very similarly between 12:00 and 1:00. ceteris paribus, we’d expect the ball to move similarly. However, the range of his fastballs in terms of the observed total movement, are from 11:00 to 2:30. In other words, Fried is able to manipulate the ball in such a way as to get a pretty wide range in results.

One of the theories as to how this happens (not only for Fried but for everyone) has been offered by Barton Smith, dubbed the Seam-Shifted Wake. Basically, by manipulating the orientation of the seams, you can trigger the airflow around the ball to push the ball in a certain direction more than it would otherwise move.

We can represent the two clock face movement charts onto a single chart like so:

The x-axis is the previous clock face that you saw on the left (Spin-based movement), and the y-axis is the one from the right (Observed Total Movement). You can see that his fastballs are all thrown roughly between 12:00 and 1:00 (or 13:00 ... I had to switch to the 24 hour clock for sorting purposes). It’s a very very tight cluster showing that Fried has a pretty strong command of how he releases his fastball. In terms of movement, you can see all of his sinkers are above the line, around 2:00 (or 14:00) which means getting alot of tail on his sinker. (His changeup as well.) His 4-seamer is more centered, with more ride than tail usually. Even enough movement that it cuts toward the glove side.

fried_clock_twoAxis.png

We can further recast the chart as follows. We keep the same x-axis, and show the y-axis as the Deviation in Movement. This chart makes it clearer that his sinker will deviate 1 to 2 hours toward the arm side from his original release. His fastball is more centered, but can cut almost one hour toward his glove side.

Now, the study.

The study

Now, it’s all fine and dandy to show that a pitcher can command his throws to such an extent. But, is the pitcher actually getting a benefit here? Is he able to get runs down? In the case of Fried, it’s a resounding yes.

Using Run Values as I’ve discussed earlier, we can break up Fried’s 600 pitches by deviation from more arm side movement to the other extreme of more glove side movement. For a pitcher, we want runs to go down, and so, the lower the number the better. And we see that for Fried, when he gets more arm side movement, his pitches will reduce runs (up to a point… it seems you don’t want TOO MUCH deviation… my guess is at that point, it means the pitch gets away from the pitcher and so might simply tail out of the strike zone). His 4-seamer that cuts only just a little (by about 0.5 hours, or 30 minutes) is the worst outcome for him.

So, this is great right? Well, I wish I could tell you that this is common behaviour for all pitchers. I looked at the 80 pitchers with at least 200 fastballs, and that had a deviation of at least 0.5 hours each way. In other words, pitchers who have 4-seamer and 2-seamers that cut and tail more than otherwise expected. And the results were all over the place.

Our friend Jared Hughes behaved somewhat like Fried. His best pitches were the 20-30% of those that had the most deviation on the arm side (so more tail than otherwise expected from spin-based movement). But there was a good number of pitchers that were the opposite, benefiting more on the glove side than the arm side. Other pitchers were better when the ball did not deviate to either side. It’s possible that those pitchers who have the pitches that deviate, those might be “mistake” pitches.

In other words, there’s a difference between throwing a pitch purposely to have deviation compared to throwing a pitch that deviates unintentionally.

More to come…

Thursday, October 22, 2020

Unit Sphere: Spin Axis

​For those not familiar with Alan’s trajectory calculator, it is easily one of the most indispensable tools in the saber toolshed. He also describes the spin axis based on the movement of a ball. That is, not the spin axis out of the pitcher’s hand, but rather the average spin axis over the flight of the ball. We’ve been treating the two as being equivalent, but Jared Hughes would suggest otherwise. In any case, that’s a discussion for a future thread.

For this thread, I just want to show what Alan’s spin axis looks like (wb, wg, ws from his calculator), when it is unitized (meaning the radius of the three dimensions of the spin axis adds up to one) and plotted at the league level. And since it is three dimensions, then we’ll plot it in three dimensions. Here’s how that looks like for a RHP (for LHP, the wb values remain the same, with the wg and ws signs flipped). To navigate, use the left, right and wheel on your mouse.

Black dots are the medians, and gray dots are reference points.

http://tangotiger.net/spin/spin3D_M.html

Thursday, September 03, 2020

Statcast Lab: Components of Movement

​Here are two pitches from Trevor Bauer that have the same amount of "break".  This is a pitch down in the dirt, and this is a pitch throw high up in the zone.

Now prima facie, they don't look anything alike in terms of break.  In fact, the way we commonly describe break, they are not only very similar, but also among the biggest breaking pitches Bauer threw this year.  Why is that?  It's because of the reference point, which is a spin-less thrown with no break at release.  In other words, if you draw a straight line from release, and then compare to where the ball actually landed, that gap is the break.  By that logic, the biggest breaking throw was probably thrown by Johnny Damon.

The problem happens, as it usually does, when we boil everything into one number.  Is there another way to describe these two pitches? Yes!  Or at least, let me show you one way.

I like to talk about "the commit point", the point of no return, the point at which the batter once he has committed to swing, must swing, otherwise he'll be called for a checked swing strike.  We set that nominally at one-sixth of a second.  It could be 0.15 seconds, it could be 0.20 seconds.  I really don't know.  I arrived at 1/6th of a second by watching videos of checked swings, and going frame by frame to see at what point I think a batter would commit to the swing.  I'm not a batter, so I don't know exactly.  Anyway, 1/6th is reasonable, without being overly precise like 0.169245 seconds.  If I get better data, then I'll revise.

At the commit point, the pitch that eventually landed in the dirt was almost 3.3 feet high.  Had it crossed the plate, it would have been underground.  So, it dropped 40 inches from commit to plate-crossing.  We can break that 40 inches down into three components:

  • 5.36 inches due to gravity (EVERY pitch drops 5.36 inches due to gravity post-commit)
  • 15 inches due to the trajectory
  • 20 inches due to its downward speed at the commit point
Add that up and there's your 40 inches.

For the high pitch with the equally big break, it was 6.36 feet high at the commit point and crossed the plate 4.67 feet high.  That's a drop of only 20 inches from the commit point.  Its three components:

  • 5.36 inches due to gravity
  • 14 inches due to the trajectory
  • 1 inch due to the downward speed at the commit point
Ah-ha, so now we see why those two pitches LOOK so very different even though we give them the same break number.  The arcs are the same, under different vantage points (and so have an overall similar break number), but the batter is not going to think these two pitches have the same break.  And once you break it down into the movement of the pitch into components, then you can better understand the break of a pitch.

(7) Comments • 2020/09/06 • Ball_Tracking Statcast

Wednesday, October 02, 2019

Statcast Lab: Cain v Taylor

This is the point at which Cain got the ball.

  ?

Runner is about 75 feet from 3B. Taylor Sprint Speed is 29 ft/s, meaning he needs 75/29 = 2.6 seconds

Cain will have to make an almost 200 foot throw. He has a somewhat below average arm at 85 mph. Here's where we need to leave the world of mph and enter the world of feet / sec. 85mph is 125 ft/s. That's at release. The ball will slow down in flight. Roughly speaking, it'll lose 10% every 60 feet. 

In this case, we'd do 200/60 = 3.33, and 0.9^3.33 = 70%. So at arrival, the speed of the ball is 70% of 125 ft/s or 88 ft/s. So the average speed of the ball in flight is about 106 ft/s. And so, a 200 foot throw will get there in about 200/106 = 1.9 seconds. (It's not this straightforward, but it's close enough.)

The exchange time (pickup to release) for a throw is about 0.5 to 0.75 seconds, which means that the ball would have reached the VICINITY of 3B in 2.4 to 2.65 seconds. It would have been close if the throw was on target. Which of course, it might not be.

How successful would Cain have been? Probably 60% if the throw is on target. And maybe it's on target 70% of the time? So, about 40% of the time he gets the runner maybe?

In the meantime, it would allow the batter to reach second base as the tying run. But, there were two outs! Making the third out at thirdbase is a cardinal sin for baserunners. Which makes it very appealing for the defense.

Let's work some MORE numbers.

http://tangotiger.net/we.html

Bottom of the 8th, 2 outs, down by 2 runs. Our choices are:

  • runners on 1B and 3B (our baseline)

or

  • runner on 2B and 3B
  • end of inning

So, our baseline is a win expectancy for the Nationals of 15.8%.

  • If Cain went for it and missed, then the win expectancy is 19.2%.
  • If Cain got the out, then the win expectancy for the Nats is 7.1%.

In other words, the tradeoff is that the Nats gets +3.4% if Cain doesn't hit the target in time, or the Nats are -8.7% if Cain gets Taylor to end the inning.

All Cain has to do is make the play 28% of the time. That is:

  • 28% of the time, the Nats lose 8.7% 
  • 72% of the time, the Nats gain 3.4%

And that's breakeven.

Remember, we guessed that Cain would have gotten Taylor about 40% of the time, and he only needed to get him 30% of the time.

Cain should have gone to third.

Statcast Lab: Park Bias Report in Pitch Speed

?Continuing my look at uncovering park biases, if any, I now turn my attention to Pitch Speed.

The typical way I have done this in the past is the WOWY (with or without you) approach.  It's fairly straightforward, if a bit tricky to code.  You look at pitchers at each park, and compare themselves to their own speeds in the rest-of-league parks.  So, at Fenway and away from Fenway (and not just Redsox pitchers, but ALL pitchers who pitched at Fenway and away from Fenway).  You figure out their difference in speeds, weighted by the lesser of their number of pitches in the "two" parks.  Here's how that looks for 2018 and 2019. (click to embiggen)

?

Now, simply that we get non-zero values doesn't represent a bias.  We have to figure out how much random variation could have contributed to that.  We see in the above that Yankee Stadium appears at the top in both years, while Globe Life Park was up one year and down the other.  This is a good sign that we've got some level of random variation.  A correlation of the two gives us an r of 0.50.  This means that about half of what you see (using this method) is signal and the other half is noise.  So seeing +0.30 in 2019 for Yankee Stadium would mean there is a bias of 0.15 mph.  Every other park in 2019 is less than +/- 0.1 mph.  

This is a very weak bias.  And it's not even clear that this bias would necessarily be at the tracking level. There could be environmental reasons where the release speed is higher in one park or the other.

As I've linked above, and you have seen in my blog the past few months, I have a clearer method to look for park bias: we compare the home pitchers to the away pitchers in the same park.  If for example Citi Field is (literally) home to fireballers, we would not expect the tracking of the away pitchers to also have a high pitch speed.  But, if the Mets pitchers aren't that (pun intended) hot, but the tracking is showing them high, we'd expect the away pitchers to also have their speeds read hot.  

So, a flat line shows zero bias, and a sloped line at 45 degrees shows complete bias.  Here's how it looks in 2018 and 2019, limited to fastballs and sinkers only (click to embiggen):

?

?

To say I was sabermetrically ecstatic when I ran this a few minutes ago is to put it mildly.  Citi Field tracks the home pitchers hot, and the away pitchers not.  Which is what you'd expect on a team of fireballers.  Yankee Stadium does show the away pitchers slightly hot, consistent with the WOWY approach I just presented.

However, we can't just look at individual points.  The key is to look at all 60 points.  And all 60 points are scattered all over the place, with no correlation at all between the fastball speeds of home pitchers to their peers in the same park.

Also note that range in speeds of home pitchers is quite wide, at +/- 1.5 mph, while the away pitchers (made up of basically every other pitcher in the league) at +/- 0.5 mph (or -0.6 to +0.4).

As I do a year-end analysis of all the data points you've seen me post about in the past few years, I will run these home/away park bias reports, so we can see the extent to which biases exist (if any).  And how we need to correct it (as we saw with the Catcher Framing)

Sunday, July 21, 2019

How is Yelich so much better at swinging at the Heart of the Plate than Trout?

Saber-sleuth @Ellen_Adair pointed out to me that while Trout is leading the league in Run Value Added of +43 runs, with a healthy +14 runs on swings and +29 runs on takes, it seems hard to understand how Trout can only be +4 runs (at the time) at swinging at pitches in the Heart of the Plate.  Shouldn't he be much higher?

To give you context, the league leader under these conditions are Yelich at +22 runs and Harper at +21 runs.  On the other end, is Joe Panik at -16 runs. So, how does Trout get only +4 at swinging at pitches in the Heart of the Plate?

I was of course speechless.  It didn't make much sense.  So, let's break it down and see what's going on.  Remember, we are limiting our look to his swings in the Heart of the Plate.  He is currently at +6 runs.  If we break up his swings into those where it ends the PA and those that don't, it looks like this:

-6 runs on 133 swings where the PA is still alive

  • 44 swings were two-strike fouls
  • 89 swings added a strike: so this is where his -6 runs occur

+12 runs on 128 swings ending the PA

  • +17 runs on 14 HR
  • +19 runs on 34 1B, 2B, 3B, errors
  • -3 runs on 13 strikeouts
  • -2 runs on 3 GIDP
  • -19 runs on 64 batted ball outs

Let's see how Yelich did his magic.  He has 264 swings in the Heart of the Plate, pretty close to Trout's 261.  So, we'll be able to compare the numbers easily enough.  How does Yelich's +28 break down?

-6 runs on 124 swings where the PA is still alive

  • 33 swings were two-strike fouls
  • 91 swings added a strike: so this is where his -6 runs occur

+34 runs on 140 swings ending the PA

  • +35 runs on 25 HR
  • +21 runs on 37 1B, 2B, 3B, errors
  • -2 runs on 9 strikeouts
  • -1 runs on 3 GIDP
  • -18 runs on 66 batted ball outs

The outlier in all that is the HR: Trout has 14 and Yelich has 25.  And that difference of 11 HR represents 18 runs.

And so, the majority of the explanation as to how Yelich is +28 and Trout is +6 runs in swinging at pitches in the Heart of the Plate is that Yelich had 11 more HR and 11 fewer 2-strike fouls than Trout.

Wednesday, July 10, 2019

Revolutions Hand to Plate and Spin RPM

(We're going to have to convert MPH into feet/second in order to understand what I'm about to show.)

A pitch released at 94mph means it is released at 138 feet/second. Of course, the pitch slows down as it is in flight. By the time the pitch crosses the plate, its speed is 126 feet/second. While it does not follow a constant acceleration path, in this short distance it is close enough. Which means that we can figure out its average speed as simply the midpoint of its initial speed of 138 and its final speed of 126, or 132 feet/second.

Since it travels about 53 feet from release to plate crossing, then we can also figure out how much time it's in the air: 132 feet/second, if traveling 53 feet means 53/132, or 0.40 seconds.

In other words, a pitch thrown at 94mph will be in the air for 0.40 seconds.

If a pitch is thrown with 2400 of spin RPM, or rotations per minute, that means it's thrown with 40 rotations per second (or 40 RPS). And since we just established that a pitch thrown at 94mph is in the air for 0.40 seconds, then 40 RPS times 0.4 seconds is 16 revolutions.

So, let's recap where we are: using an initial speed, we can estimate an average speed. We used a fixed distance of 53 feet, though pretty much every pitcher releases the ball at 52 to 54 feet. With a speed and distance, we can figure out time. And with time and RPM, we can figure out number of revolutions. In other words, our two main variables are initial speed, and spin RPM.

When we chart that out for each pitch tracked in 2019, we get these number of revolutions.

?

For those who are a bit math savvy, take the jump below:

Read More

(12) Comments • 2019/07/14 • Ball_Tracking

Saturday, April 20, 2019

How much better is it to throw your fastball at 95mph than 91mph?

?This is what I did: For each pitcher in 2018, I selected all his fastballs (four seamer, two seamer, sinkers), and ordered them from fastest to slowest.  I took each pitcher's 10% fastest pitches and put them in Group 1.  I took each pitcher's 10% slowest pitches and put them in Group 5.  I took each pitcher's next 20% fastest and put them in Group 2, and his next 20% slowest and put them in Group 4.  Finally, his middle 40% pitches by speed are in Group 3.  So, all pitches from Group 1 to Group 5, fastest to slowest.

There were just over 300 pitchers with 500+ fastballs, and that becomes my group of pitchers.  The average speed of each group of pitches follows:

  • 95mph: Group 1
  • 94mph: Group 2
  • 93mph: Group 3
  • 92mph: Group 4
  • 91mph: Group 5

Oh, one more thing.  For each pitch, I establish a "run value". It's the classic Pete Palmer Linear Weights, but at the pitch level, rather than the plate appearance level.  The more negative the run value, the more runs are suppressed.  Negative is good for pitcher.

So, for each group, I simply figured the average run value, per 100 pitches (roughly a full start).  And the simple average among these 300 pitchers was as follows:

  • -0.09 runs: Group 1
  • -0.22 runs: Group 2
  • -0.10 runs: Group 3
  • +0.08 runs: Group 4
  • +0.38 runs: Group 5

Now, this is very interesting.  While throwing faster does in fact get better results, it's not totally dispositive.  It is possible that at the very highest speed, the pitcher is losing... something.  Maybe control? I'll have to do a breakdown.  (Or better yet, the aspiring saberist out there can do that.) Otherwise, for the other groups of speed (Groups 2 through 5), there's a 0.1 to 0.3 runs of gain, per 100 pitches, per 1 mph.  

In terms of per 9 IP, you can multiply all that by 1.5. So, Group 5 to Group 4 is an improvement of about 0.45 runs per 9IP for that one extra mph.  From Group 4 to Group 3, it's 0.27 runs per 9IP.  From Group 3 to Group 2, it's 0.18 runs per 9IP.  From Group 2 to Group 1, it goes the other way.

So, it depends how you want to look at it.  If you look at Group 5 to Group 1, that's 4 extra mph, and an improvement of 0.47 runs per 100 pitches, or 0.70 runs per 9IP, or about 0.18 runs per 9IP per mph.  If you look at it from Group 4 to Group 2, that's 2 extra mph and an improvement of 0.30 runs per 100 pitches, or 0.45 runs per 9IP, or about 0.22 runs per 9IP.

Therefore, I think we can safely say that it's about 0.20 runs per 9IP per mph.

***

What is interesting about this is that this is consistent with findings from a decade ago.  In the Rule of 17, pitchers as starting pitchers give up 17% more runs than those same pitchers would give up as relievers.  Which roughly translates to about 0.70 runs.  And those pitchers, as relievers will throw 3 or 4 mph faster than as starters.

***

In the comments, I'll take a look at the other pitches in the arsenal to see if this effect applies to them as well.  Stay tuned.

(20) Comments • 2019/04/23 • Ball_Tracking Statcast

Wednesday, March 27, 2019

WOWY Framing, part 5 of N: barnstorming testing

?In part 4, I talked about how to test for parks.  You really should read that first, as what I'm about to say next will be... unconventional.

Thanks for reading that.  This test will be based on the away team of each park.  Let's take the Expos (*).  They visit Shea (**).  The catchers for the Mets are now labelled "host of Expos".  Expos visit Fulton County (***) and the Braves catchers are also labelled "host of Expos".  So, I will have "Expos away" paired with the rest of the NL (****) "host of Expos".  I repeat all this for every team.

  • (*) yes, I am still in denial
  • (**) yes, I am still stuck in 1994
  • (***) yes, the Expos broke the division string
  • (****) no interleague play still

So, what do we expect?  Well, without doing ANY adjustments, we should see no correlation.  After all, we have the Expos not tied to any one park, nor to any one team.  The Expos are barnstorming their way around the league.  As are all other teams. This is the perfect control group. This is the image on the left below.

And AFTER we apply adjustments, we should ALSO not see any correlation.  This is how we can make sure we haven't overfitted the data.  Sweet right?  This is the image on the right below. (click to make bigger)

And this looks pretty good to me.

?

WOWY Framing, part 4 of N: pitcher and park adjustments

?One of the key components of the WOWY process is being able to identify and effectively neutralize variables.  In this case, for Catcher Framing, we are interested in accounting for the pitcher and park.

One of the best ways to know that you need a park adjustment to begin with is to compare the performance of the home and away catchers.  Ideally, the correlation should be 0.  After all, they are two independent groups of catchers.  Well, they are dependent: they play in the same park.  

This is what it looks like for 2015-2018, doing Catcher Framing without doing a WOWY.

?

As you can see, there's a strong park influence.The runs saved by the home catchers are certainly not independent from the away catchers.  The correlation is a high r=0.48.

Now, using the WOWY process, this is what I end up with.

I was able to remove the park influence, but it is possible I went slightly too far.  This correlation is r=0.14, and in the negative direction.

Also notice how the points of the away catchers got much tighter (the data points along the y axis are much more compressed).  The home catchers of course won't compress as much, since those catchers are linked to their home parks.  The away catchers is essentially pretty even across all the parks.

Anyway, so, any time you create a metric, always test for a park effect.  And after you (think you have) neutralized for the park, test again and make sure there's no more bias, or at least have the bias reduced a good deal.

(7) Comments • 2019/08/30 • Ball_Tracking Parks

Sunday, March 24, 2019

WOWY Framing, part 2 of N: with or without Bartolo

In 2018, Bartolo Colon threw 500 pitches in The Shadow Zone.  Among the 156 pitchers with at least 300 such pitches, his 56.8% called strike rate was fourth highest.  The range was 58.3% down to 35.9%, with a mean of 47.3%.  

The Shadow Zone is the region between the Heart of the Plate (the region where batters want to see pitches, and so virtually every take is called a strike) and the Chase Zone (the region where the pitchers are hoping the batters chase pitches, and so virtually every take is called a ball).  That Shadow Zone nestled between the two is pretty wide, straddling both sides along the edge of the strike zone.

As a result, we can further subdivide The Shadow Zone into Inner Shadow Zone (meaning the part of the Shadow Zone that is part of the textbook strike zone), and the Outer Shadow Zone (or the part just outside the strike zone).  When we do that, we see that Bartolo gets 83% called strike rate in the Inner Shadow and 30% in the Outer Shadow.  The league average is 79% and 22% respectively.  That puts Bartolo above average in both regions.

As we know, the catcher plays a role in getting the called strikes.  Bartolo had three catchers last year.  This is how many pitches he threw to each of them in the Inner Shadow, and how often he got the called strike rate:

  • 0.832 (203) Chirinos, Robinson
  • 0.795 (44) Perez, Carlos
  • 0.857 (7) Kiner-Falefa, Isiah

With each of them, he got above average called strikes.  Is this about Bartolo?  Or, did Bartolo happen to have three catchers each of them above average?

Welcome to WOWY, With or Without You.  Chirinos faced 29 (!) different pitchers.  The one he paired with the most was Bartolo, which means he had 28 pitchers without Bartolo.  From Cole Hamels and his 145 pitches in the Inner Shadow down to Zac Curtis and his 1 pitch.  Without Bartolo on the mound, these 28 pitchers threw 1277 pitches in the Inner Shadow.  Their called strike rate was 73%.  Since we can take a reasonably small step to call these 28 pitchers "average", we can therefore compare Bartolo to these 28 pitchers through the "common catcher", and say that Bartolo is +10% in terms of getting the called strike rate.

We can repeat this exercise with his other two catchers.  Kiner-Falefa got 72% called strike rate without Bartolo and Perez got 73% called strike rate without Bartolo.

So, adding everything up, and his catchers, without Bartolo, caught 1959 pitches in the Inner Shadow from 40 other pitchers, of which 73% were called strikes.

By going through this process, we can establish how much of an effect each pitcher has on each catcher.  Bartolo therefore is about +10% in the Inner Shadow.

There are two other things I haven't mentioned.  One is the park.  But in this particular case, Bartolo threw in about the same parks as his peers (the other pitchers of his catchers), so that's not really an issue.

The other is Random Variation.  Because as much as we have OBSERVED Bartolo to be +10% on 254 pitches in the Inner Shadow, it is still only 254 and so subject to Random Variation.  When we remove the effect of that, it cuts his impact by about half. And so I credit him with +5%.

When we repeat for the Outer Shadow, his 30% called strike rate is compared to his peers (through the common-catchers) of 19%.  That's +11%, which I cut down in half in his case to 5.5%.

And therein lies the problem.  By treating them independently, I'm not taking advantage of the fact that each region can inform on the other.  And so, I really do not want to cut each one in half.  It'll probably be 30%.  But let's talk about that next time.

Page 1 of 9 pages  1 2 3 >  Last ›

Latest...

COMMENTS

Feb 19 11:05
Bat-Tracking: Timing Early/Late

Feb 07 15:38
Aging Curve - Swing Speed

Feb 06 11:55
Batting Average as a proxy for fun!  Batting Average as a proxy for fun?

Feb 03 20:21
Valuation implication of straying from the .300 win% replacement level

Jan 31 13:35
Breaking into the Sports Industry WITHOUT learning to code

Jan 26 16:27
Statcast: Update to Catcher Framing

Jan 19 15:02
Young players don’t like the MLB pay scale, while veteran stars love it

Jan 14 23:32
Statcast Lab: Distance/Time Model to Catcher Throwing Out Runners

Jan 07 13:54
How can you measure pitch speed by counting frames?

Jan 02 17:43
Run Value with runners on base v bases empty

Dec 28 13:56
Run Values of Pitches: Final v Intermediate

Dec 27 13:56
Hall of Fame voting structure problem

Dec 23 19:24
What does Andre Pallante know about the platoon disadvantage that everyone else does not?

Dec 21 14:02
Run Values by Movement and Arm Angles

Dec 18 20:45
Should a batter have a steeper or flatter swing (part 2)?

THREADS

June 04, 2024
Statcast Lab: Vertical Swing Angles

March 20, 2024
Statcast: Update to Catcher Framing

January 15, 2024
Statcast: Location of Catcher influences the called strike rate

January 15, 2024
Statcast: How to calculate the Vertical Approach Angle (VAA)

July 18, 2023
Drag values: standard deviation v mean

January 16, 2023
Keeping the signature of the ball in the Dark Side of the Moon

November 22, 2022
Seam Orientation Update

November 07, 2021
Statcast Lab: Markov Sequences, 4-seamers on 0-1 counts

June 04, 2021
Pascal’s Run Values

January 24, 2021
Statcast Lab: Diverging Fastballs and the Seam-Shifted Wake

October 22, 2020
Unit Sphere: Spin Axis

September 03, 2020
Statcast Lab: Components of Movement

October 02, 2019
Statcast Lab: Cain v Taylor

October 02, 2019
Statcast Lab: Park Bias Report in Pitch Speed

July 21, 2019
How is Yelich so much better at swinging at the Heart of the Plate than Trout?