[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
THE BOOK cover
The Unwritten Book
is Finally Written!

Read Excerpts & Reviews
E-Book available
as Amazon Kindle or
at iTunes for $9.99.

Hardcopy available at Amazon
SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
Shop Amazon & Support This Blog
RECENT FORUM TOPICS
Jul 12 15:22 Marcels
Apr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref

Advanced

Tangotiger Blog

A blog about baseball, hockey, life, and whatever else there is.

Fielding

Fielding

Friday, April 03, 2020

Systematic bias in fielding metrics: when old school stats are actually more reliable

Thought exercise

Suppose you have 10 teams that ALWAYS shift their infielders. Not half the time like the Dodgers,who lead the league. I mean all the time. Against LHH, the 3B plays in short RF, and the SS plays on the right side of the bag. Against RHH, the 2B plays on the left side of the bag. I’ll call these teams The Shifters.

And suppose you have 20 teams that NEVER shift their infielders. Not 13% like the Cubs, lowest in the league. I mean NEVER. In other words, all the infielders are where tradition would dictate. I’ll call these teams The Traditionalists.

Now, since The Shifters and The Traditionalists are playing against the same teams, they all face a similar distribution of batted balls. Suppose that The Shifters are getting a few more outs out of their infielders against LHH, but they’ve overshifted and getting a few less outs against RHH. In other words, overall, there’s the same number of outs.

Let’s further suppose that the 3B on The Shifters are converting 3 outs per game, and the 3B on The Traditionalists are ALSO converting 3 outs per game. It’s just that the 3B on The Shifters, against a LHH, are getting them all from short RF, while against RHH they are all closer to the 3B line. The 3B outs against The Traditionalists are all where you’d expect them.

Enter The Zone

The way “zone” systems work is that they assign a “zone of responsibility”. They decide which zone each OFFICIAL POSITION is responsible for. The “official position” is technically the fielding position on the batting lineup. And they figure that out based on those zones that each position is converting plays into outs. Since the short RF position has plays being converted into outs not-often (only 10 of the 30 teams have their 3B there against LHH, and none against RHH), that zone does not belong to any infielder.

So, these zone systems have a numerator (outs) and denominator (plays) for all the zones where an OFFICIAL POSITION owns. What happens to outs made “out of zone”. In some systems, it gets added to the numerator only. In other systems, it gets added to both numerator and denominator. What happens to hits “out of zone” of the 3B? Well, those either go to the “in zone” of one of the infielders, or they just go away altogether. In other words, a basehit by a LHH to short RF against The Shifters that is not in-zone to the 2B is ALSO NOT in-zone to the 3B. This basehit disappears. The denominator for The Shifters just got lower.

For a zone-based system with a league of 10 The Shifters and 20 The Traditionalists, this is a systematic bias. And the more data you have, the worse it is. That’s because the more data you have, the less Random Variation there is, and the more the Systematic Bias will expose itself.

Exit The Zone

An old-school system doesn’t have this issue. That’s because EVERY batted ball is assigned to a fielder. All basehits are, effectively, shared among all seven fielders, regardless of where the fielders are, or the ball went. You may think this is a problem. And, in the short-run, it is. But given 5 or 10 or 15 years of data, that Random Variation will get reduced substantially. And so, if you have 3 outs per game being made by 3B among The Shifters and 3 outs per game by 3B among The Traditionalists, they will all look the same. That’s because the OFFICIAL POSITION is what is being held responsible not where they are standing. This makes no sense on a single play level. This is still a systematic bias. It’s just a different kind of systematic bias than a zone-system.

Where are we today?

Now, all of that is theoretical. The point of all that is to understand how things work. Where are we in 2019? I don’t know. That’s where The Aspiring Saberist comes into play. Hopefully, that’s one of you reading all this.

With regards to the way Statcast works: we remove positioning altogether. And so, the area of responsibility is where you are actually standing for that particular play. If a 3B is in short RF, he is not a 3B. He is “fielder in short RF”. We don’t care about his official fielding position on the batting lineup card. Just saying that should make it clear why we can’t use his official fielding position on the batting lineup card. That’s not how responsibility works. Where you stand IS what you are responsible for. So, whether your official fielding position on the batting lineup card is 2B, SS, or 3B, and you are standing in short RF, then we use that Fielder Role to establish responsibility. Your Role establishes your Responsibility.

Wednesday, March 25, 2020

Statcast Lab: Is there a different run value needed based on the infield slice?

One of the things that we’ve done in the long past is to give a different run value for 1B/3B, compared to 2B/SS. The idea was simple enough to understand: if a 2B or SS allowed a hit, it was likely a single. And if it was a 1B/3B, there’s a chance that it could be an extra base hit down the line.

Seems reasonable enough. So, what we ended up doing, in the long past, was to give .75 runs per play for 2B/SS and .80 runs for 1B/3B. Again, seems reasonable enough.

I looked at the Outs Above Average (for infielders only; I’ll do outfielders later today or tomorrow). And while the direction of that theory holds, the magnitude does not hold quite as much. For the 2B/SS roles, the impact of their play is -.005 runs, compared to the average infield play. While for the 1B/3B roles, the impact of their play is +.010 runs, compared to the average infield play. (The overall WEIGHTED average is 0, and you get there because there’s about 2X the plays at 2B/SS compared to 1B/3B).

So, the end result is that the gap in runs between the middle infielders and the corner infielders is about .015 runs, not the presumed long past value of .050 runs.

Why would that be? It’s probably easiest to say that 5% of the “assigned hits” are extrabase hits. But as we know, there’s alot more than just 5% hits that are extrabase hits, even if we limit it to the infield. For example, almost 10% of groundballs are extra base hits. So why the discrepancy? Well, half of those groundball extra base hits are “automatic hits”. In other words, they are hits not because the fielder wasn’t good enough to get there, but rather, his POSITIONING didn’t allow him for a chance to get there. And since Outs Above Average takes as an assumption of fact that the positioning of the player is not a skill of the player (easier to believe these days with shifting), then those auto-hits are not opportunities for the player. They end up being noise.

When we get to Layered Hit Probability (and by extension Layered wOBA), we will recover those “lost” hits, and be able to properly assign them to “team fielding alignment”. But, for the Outs Above Average metric, those aren’t in play (no pun intended).

Ok, so you may be thinking,we lost half, so maybe instead of the long past value of .050 runs, maybe it should be .025 runs? That is a good thought. Except, alot of those remaining extra base hits that are assigned to the fielder are “really difficult”. In other words, they remain in the pool for the player, but the hit probability is so low that they have limited damage to the fielder.

So, if you want a quick summary: the kind of hit that an infielder is responsible for is almost always a single. And because of that, when you look at outs saved, the translation to runs saved will be almost identical for middle infielders as for corner infielders.

Next time, I’ll compare IF to OF.

(2) Comments • 2020/03/27 • Fielding

Tuesday, January 14, 2020

Statcast: Do fielders perform worse when they are in a Shift formation?

?The theory would be that by being out of position, a fielder will have less familiarity with a situation and so will perform worse than his "natural" location. If this is true, we should see it in Outs Above Average. I did something fairly simple: what is the OAA for a TEAM INFIELD if we have the shift on? And what is the OAA for a TEAM INFIELD if all the fielders are in their standard location?

Note: A shift is any formation where you have 3 or more infielders to one side of the bag.

So, there is an effect and in the direction you'd expect. But not the magnitude. When an infield is playing in its standard formation, they convert 0.0003 more outs per play. With about 2000 plays a season, that works out to 0.6 more outs per season. Hardly a number to worry over, even if you can make the case that it is "true" that they perform "worse".

HOWEVER. However, because shifts are disproportionately set with a LHH, we can break down the OAA of the team infield between LHH and RHH. And when we do that, well, things start to change. With a RHH, the infield does perform better in its standard formation, by a whopping 0.006 outs per play, which is 12 plays per season. Since a shift with RHH essentially means moving the second baseman from the right side to the left side, it is that positioning that we can narrow down as the culprit. This is also consistent with other research I've shown in the past where the performance of RHH on shifts is noticeably worse for the fielding team.

As for LHH, the OAA is slightly BETTER when the infield is in a shift formation, by 0.0026 outs per play, or 5 outs per season. At this point, the "familiarity" issue likely no longer applies, given that one-third of LHH plate appearances are being shifted. This may also explain why LHH on shifts is somewhat better for the fielding team: in addition to getting the fielders in a better spot, they perform slightly better when in those spots.

This is all preliminary, so it'll be interesting to break this down in the coming weeks and months.

Update: I should note that I did not control for the quality of fielders.  So, if a team that shifts more happens to do so with better fielders with LHH, then that would explain the results we see.  And if a team that shifts more happens to do so with WORSE fielders with RHH, that would ALSO explains the results we see.  As I said, this is the first step.

(6) Comments • 2020/01/15 • Fielding Statcast

Thursday, January 09, 2020

Statcast Lab: How much do we want to adjust for the direction of a batted ball?

This is what has been perplexing me for months.  Is this a bias we want to remove, or retain?  That shows the OAA (Outs Above Average) for balls hit in the hole.  The black line is pretty sweet at 0.  But the bLue (for left, or 3B) is above average and the Red (for right or SS) is below average.  And this IS what we'd expect: for balls that both players can reach, the 3B will have a better shot at getting the runner than the SS, simply based on the running direction.  The 3B is closing the distance as he gets the ball, while the SS is enlarging it by the time he releases it.

Therefore, do we want to adjust this "bias"?  When Fred Lynn, LHH, faces a RHP, he has an advantage over Jim Rice, RHH.  But, we don't adjust that away, since being a LHH is part of Lynn.

We could make the argument that since the Rockies are positioning Story and Arenado, that Story has no choice.  So, we can't penalize him.  But, all fielders have some leeway, pitch to pitch to move around, as they respond to expected pitch types and locations and runner leads.

So, a ball may be hit halfway between Story and Arenado, but that's only AFTER Story and Arendo take their spot on the field.  Because of the nuanced nature of being able to move two or three steps in response to the pitch call and runner movement, what we'd actually want is the pre-nuanced, club-controlled spot for each fielder.  Which is unknown.

If we remove the bias altogether, and make SS and 3B equal on these plays, we are saying that SS and 3B are equal in getting to balls in the hole.  Which we can only say if we add the condition of "compared to other players at their locations, not to each other".

And that chart above is going to look kinda confusing if you see the black, red, and blue lines all sitting one right on top of the other.

What would YOU like to see?

(12) Comments • 2020/01/25 • Fielding

When is a gimme not a gimme? Statcast Infield Defense style.

On 95%+ plays, with an average 97% out rate, Baez made all 147 plays (100%, compared to an expected 142 outs). That's +5

Tatis on similar plays: 111 plays made only 101 outs (91%) compared to an expected 107, or -6.

In the outfield, the vast majority of the OAA is based on 2+ star plays.

In the infield, HALF the value is on 90%+ plays.  This is because the chance of a misplay is so much greater on groundballs than airballs.

Making the routine play has tremendous value.

In case you missed it, you can slice/dice right here.

Statcast Infield Defense: Q&A

?Lots of excellent question everywhere on the newly released Infield Defense metric.  Twitter, Reddit, and at BTF among the places I've seen so far.

What I'll do is create a Q&A based on the questions or issues being raised.  I'll start with BTF, and go as far as I can before I go to bed, then pick it up in the morning.  I'll create one Q&A per comment.  So, check it out below in a few minutes.

(73) Comments • 2020/01/10 • Fielding

Wednesday, January 08, 2020

Statcast Lab: History of the Fielding, now with Statcast Infield Defense

?Primer article by Mike on MLB.com

Savant main page by Daren, along with drill down and player pages, with Jason bringing the data together.

My tech blog post: a slimmed down web version, and the expanded downloadable PDF.

(5) Comments • 2020/01/09 • Fielding Statcast

Sunday, November 24, 2019

Statcast: Catcher Framing and Called Strike Rate in The Shadow Zone

?In an excellent article on Catcher Framing, Mike created this image at the team level, which shows the percentage of called strikes in The Shadow Zone. 

He further pointed out:

The top team, Arizona, and the bottom team, Chicago, each had a nearly identical amount of takes in that area, 4,819 for the D-backs and 4,803 for the White Sox. Yet the D-backs, led by good framing from Carson Kelly and Alex Avila, had over 400 more called strikes there.

This puts the impact in stark terms.  Looking at the called strike rate in The Shadow Zone, one catching team can get 200 more strikes than the average team, while another catching team can get 200 fewer strikes.  How much value CAN a strike have?  I can tell you the answer is 0.125 runs per called strike, and so, we're talking about +/- 25 runs.

But, let's describe it in something a bit cruder, but with more relevance. If you think of 3 strikes being a strike out, and 9 strikes being an inning, then 200 called strikes would be about 22 perfect innings.  Each inning generates an average of 0.5 runs, and so, a clean inning saves you 0.5 runs.  If you have 22 of those, then you've saved 11 runs.  That's the crude way.  The better way is 0.125 runs per called strike.

As for simply relying on the called strike rate in The Shadow Zone, we can compare that to the runs saved on the strike calls per 100 pitches.  As you can see, an extremely strong relationship.  Indeed, an r of close to 0.95.  So, if you are having a tough time buying into Catcher Framing and runs and how all that is derived, you can take the first step and simply look at its most basic: percentage of pitches called strike in The Shadow Zone.  If you can do that, you'll be 90% of the way there.

?

(Click to embiggen)

Monday, October 21, 2019

Statcast Lab: Catcher Framing, WOWY at the Individual Game-Level

On Sept 15, 2008, at PNC Park, Dodgers catcher Russell Martin caught 19 called pitches in the inside part of the Shadow Zone. That would be zones 11 through 19, within the green dotted line.

While today, those are called strikes almost 80% of the time, it wasn't the case back in 2008. That could be any combination of the umpires improving over time and the tracking system improving over time. So, it would be more accurate to say that he caught those 19 pitches in the reported region noted above. Of those 19, 14 were called strikes.

In that same ballpark on that same day, his teammate A.J. Ellis was also catcher, as was opposing catcher Ryan Doumit. Those catchers caught 18 pitches in the same reported region, but got only 4 pitches called strikes (or 22.2%). Had Martin got the same calls, he would have gotten 19 x 22.2% = 4.2 strikes, instead of his actual 14. In other words, he got 9.8 more called strikes than the other catchers that day in that park.

On April 2nd against the Giants at Dodger Stadium, in the outside part of the Shadow Zone, with Bengie Molina as his opposing catcher, he got 3 strikes out of 20 pitches compared to Molina of 7 for 13. That made Martin MINUS 7.8 strikes that day.

And so we can go through every single game in the same way, and tally up the results. In the Heart of the Plate, he was +63 strikes (+35 at Dodger Stadium, +28 away). However, we would NOT expect any venue bias because of the way we are directly comparing Martin to the other catchers in the same venue on the same day.

  • In the inside part of the Shadow Zone, he was +43 at home, +45 away, for a total of +88.
  • In the outside part of the Shadow Zone: +29 home, +7 away.
  • In the Chase Zone: +39 home, -6 away.
  • In the Waste Area: +1 home, 0 away.

All tallied up: +147 home, +74 away, +221 total. Each strike is about 1/8th of a run, and so those +221 strikes translates to +28 runs.

In a more elaborate process that considers more variables and the zone in a more granular fashion, Fangraphs shows +30 runs.

When I repeat this for every year, Martin's career comes out to +171 runs. Fangraphs has a very similar +166 runs.

As much as it strains the credulity to think that Martin's framing could have led to +28 runs, I also can't reject that conclusion. I can reduce that number somewhat for the uncertainty level of the measurement. But given the way I controlled for the metric, by directly comparing Martin to the other catchers in the same park on the same day, that's a tough call as well.

I could repeat the above by focusing on each individual bin and controlling for the pitcher, and potentially the batter. But that basically will put me on a path to replicate Fangraphs. And given that without doing any of that I ALREADY match Fangraphs, all I'd be doing is further matching Fangraphs.

So, I don't want to agree with the numbers, but I am forced to.

I should note that we don't see these wide numbers in the past few years. That could be any combination of the umpires improving and the tracking system improving.  It could also be that teams are now very aware not to have a Ryan Doumit behind the plate, so it could be improvement in catcher selection and coaching of catchers.  In other words, whatever inefficiencies exist, it's being slowly closed on all sides.

(2) Comments • 2019/10/22 • Fielding Statcast

Wednesday, October 02, 2019

Statcast Lab: Cain v Taylor

This is the point at which Cain got the ball.

  ?

Runner is about 75 feet from 3B. Taylor Sprint Speed is 29 ft/s, meaning he needs 75/29 = 2.6 seconds

Cain will have to make an almost 200 foot throw. He has a somewhat below average arm at 85 mph. Here's where we need to leave the world of mph and enter the world of feet / sec. 85mph is 125 ft/s. That's at release. The ball will slow down in flight. Roughly speaking, it'll lose 10% every 60 feet. 

In this case, we'd do 200/60 = 3.33, and 0.9^3.33 = 70%. So at arrival, the speed of the ball is 70% of 125 ft/s or 88 ft/s. So the average speed of the ball in flight is about 106 ft/s. And so, a 200 foot throw will get there in about 200/106 = 1.9 seconds. (It's not this straightforward, but it's close enough.)

The exchange time (pickup to release) for a throw is about 0.5 to 0.75 seconds, which means that the ball would have reached the VICINITY of 3B in 2.4 to 2.65 seconds. It would have been close if the throw was on target. Which of course, it might not be.

How successful would Cain have been? Probably 60% if the throw is on target. And maybe it's on target 70% of the time? So, about 40% of the time he gets the runner maybe?

In the meantime, it would allow the batter to reach second base as the tying run. But, there were two outs! Making the third out at thirdbase is a cardinal sin for baserunners. Which makes it very appealing for the defense.

Let's work some MORE numbers.

http://tangotiger.net/we.html

Bottom of the 8th, 2 outs, down by 2 runs. Our choices are:

  • runners on 1B and 3B (our baseline)

or

  • runner on 2B and 3B
  • end of inning

So, our baseline is a win expectancy for the Nationals of 15.8%.

  • If Cain went for it and missed, then the win expectancy is 19.2%.
  • If Cain got the out, then the win expectancy for the Nats is 7.1%.

In other words, the tradeoff is that the Nats gets +3.4% if Cain doesn't hit the target in time, or the Nats are -8.7% if Cain gets Taylor to end the inning.

All Cain has to do is make the play 28% of the time. That is:

  • 28% of the time, the Nats lose 8.7% 
  • 72% of the time, the Nats gain 3.4%

And that's breakeven.

Remember, we guessed that Cain would have gotten Taylor about 40% of the time, and he only needed to get him 30% of the time.

Cain should have gone to third.

Monday, August 12, 2019

Catch Probability and Abandon Rates

?Statcast Intern Kristen Austin had a sensational presentation at Saber Seminar and you can see her presentation in its entirety here.  One of the topics she briefly discussed was Abandoned Rates.  I asked her to expand upon it in a blog post, and so she has this PDF that she is sharing with us.

(8) Comments • 2019/08/16 • Fielding

Monday, June 17, 2019

Statcast Lab: Creating the Jump Metric

?How do I create a metric, and more specifically, how did the Jump metric come about? There is alot of art and science to the process of metric creation. For the pure artists, sorry, but we need some science. For the pure scientists, sorry, but we need some art.

What I am always trying to do, is organize, classify, categorize the data. We do this so we can actually speak the data. For example, we can create a function of exit speed to create a "hardest hitter". That function would likely be a quadratic function of some sort. Or, we can say "batted ball hit at least 95 mph". As much as the scientists want that function, and as much as I want it as well which you can see it here, it's too hard to speak that. 95+ is ubiquitous. And, just as important, it's an excellent proxy for that function. If we can speak it, with little loss of accuracy, then speak it. In other occasions, I can't do it, and so, I go all-in on creating a function (or series of functions), which is what Catch Probability is. Though even there, I try to come up with a shorthand, such as each foot affects the Catch Probability by 4%.

For Jump specifically, the primary decision is whether to represent the unit in time or distance. Do we want to show that Kiermaier is a certain number of seconds quicker than average, or a certain number of feet quicker than average. And by seconds, I mean, tenths of seconds. As I tried both ways, it become clear, I had to represent it in feet. No one can appreciate what 0.1 or 0.2 seconds means. Everyone can appreciate what 3 feet means. If a player JUST misses a catch, we don't say "he missed it by 0.1 seconds". We DO say "he missed it by a step" (or 3 feet). We can freeze a play and see that distance, but not that time. Anyway, so it become clear that the result had to be in feet.

Once that decision is made, then the other choice is a given: the selection must be made in seconds. In other words, if the unit you create is expressed by time, then the data must be partitioned by distance. And if your unit is expressed in distance, then partition the data by time. This is critical. If you don't see it, you will when you create your own metrics.

Knowing that time is the partition, now we need to select thresholds. We do this because we need to organize, classify, categorize the data. Virtually all catches are made with 3+ seconds from pitch release to catch. This becomes my first point of reference: let's focus on Jump solely based on performance in the first (up to) 3 seconds. It might have been 2 or 2.5 or 2.8. As I tried different ways, 3 seconds became the threshold.

The next thing is what we mean by Jump. And we actually had a few components. After many discussions with the rest of the Statcast team, principally Mike, Jason, Travis, Cory, Matt, we finally settled on three: Reaction, Route, Burst.

It was especially with discussions with Mike that cemented the process. We had a few discussions on whether going "the right way" is needed for Reaction and Burst. Once we decided that Route would encapsulate going "the right way", the other two pieces fell into place quickly.

Burst was interesting because at the same time, Travis was working on speed components for batter-runners, other than Sprint Speed. And since Sprint Speed uses the same scale, and can be compared between batter-runner, runners-on-base, fielders, it was highly desirable, if not necessary, that the same applies for Burst. We quickly settled on 1.5 seconds as the time window for Burst, for batter-runner. And given that I had already established 3 seconds for the Jump window, chopping that into two windows, of 0 to 1.5 (Reaction) and 1.5 to 3.0 (Burst), came into being very quickly. In addition, the Burst Distances for fielders at 1.5 to 3.0 is similar to the Burst Distances for time threshold for batter-runner that we chose. It all came into place.

Reaction was purely distance travelled in the first 1.5 seconds, regardless of direction. Burst was the next 1.5 seconds, also regardless of direction. Route was the bridging metric that was the difference between distance travelled and distance covered. And therefore, Jump is the total distance covered (not travelled) in the first 3 seconds, in the correct direction.

Now, just because all of this came into place and seemed to make sense wasn't enough. We need the metric to actually represent something about the fielder. Once we saw Jackie Bradley Jr being on the leaderboards with both quick reaction and indirect route, year after year, we knew we had it. And then seeing the results of other players, the very strong correlation year to year, it all came into place.

The last step was actually the longest: productionize the metric. We had to get this into the pipeline for our various endpoints. We had to get Daren to add his magic with Savant to take what is essentially tabular data and make it resonate with the fans. Mike had to do all the research to come up with a sabermetric staple of an article, one that is both relevant and timeless.

Anyway, so that's the process for metric creation in general, and for Jump in particular.

(7) Comments • 2019/06/21 • Fielding Statcast

Wednesday, May 15, 2019

Statcast Lab: impact of the wall and/or going back on Catch Probability

?One of the team members was asking me how is it possible that the wall and/or going back can have such a dramatic effect on Catch Probability.  And he showed me an example, which was a pretty dramatically different number.  There are four main variables for Catch Probability:

  1. How far does the fielder have to run from his starting point to the (eventual) landing point?
  2. How much time does he have to get there?
  3. Does he need to run back?
  4. Is the wall an impediment to making the play?

For this illustration, I will show you the actual results, as well as the estimated catch probability, for plays where the fielder has to run 80 to 90 feet, with an opportunity time (pitch release to landing) of 4.5 to 5.0 seconds, with the 4 combinations of wall and/or back.

??

To read the first line: we have 1101 plays since 2016 where the fielder had to cover 80 to 90 feet in 4.5 to 5.0 seconds, where he did not have to run back, nor was the wall an impediment.  The Estimated Catch Probability was 54%, while the actual catch rate under those conditions was 55%.  The last line shows that the outfielder had to run back and that the wall was an impediment.  Under those conditions, they caught the ball 3% of the time, compared to an estimated 4%.  

I used the above example because that was the test case that I was asked.  The results were pretty good.  Almost as good if I check similar conditions, like so:

??

This one is an extra 0.5 seconds of opportunity time to make the play.  Not nearly as good, but still pretty good.  Also note that those 0.5 seconds adds 30% to 60% of making the out.

The rough rule of thumb is that for plays in the sweetspot, 1 foot = 4% and 0.1 seconds = 10%.  It obviously tapers off when the catch probability is closer to 0% and 100%. 

Below you will find all the data plotted out.

Read More

(13) Comments • 2019/05/21 • Fielding Statcast

Monday, January 28, 2019

Statcast Lab: How much influence do pitchers have on batted balls? (part 1)

This blog post will just be about Justin Verlander, and the focus is only on the outfield. We have 217 batted balls that we either assigned to an outfielder, or was unplayable by an outfielder (either it was too far for even the best outfielder to make a play, or it hit the wall high). Of those, 167 were caught, or 77%. He benefited very slightly from his outfielders, who combined were +2 outs above average. In other words, we estimate, based on the batted balls he allowed, relative to the fielding alignment and parks they were in, for 165 of those balls to be caught. So, his xOuts (among outfield plays only, or xOutsOF if you will) is 76%.

In part 2, we'll look at all the pitchers. (I have no idea where he ranks, since I have yet to run it for anyone else.)

Here's the breakdown of the 217 batted balls:

  • 34: 34 hits, all impossible to catch
  • 13: 12 hits, 1 out, at under 50% catch probability (average of 15%; in other words, we would have expected two to be caught, but only one was)
  • 10: 1 hit, 9 outs, at 50-75% catch probability (average of 66%; expect 7 to be caught but 9 were)
  • 15: 2 hits, 13 outs, at 75-90% catch prob (average of 86%; expect 13, and 13 caught )
  • 18: 1 hit, 17 outs, at 90-99% catch prob (average of 95%, expect 17, and 17 caught)
  • 127: 0 hits, 127 outs, at 99%+ (average of 99.5%, expect 126, and 127 caught)

All in all, we see that 127 were pure gimmes, and 34 were pure auto-hits (gimme outs and takey hits). In other words, Verlander managed to get 161 of 217, or 74%, of the batted balls to not involve any fielder skill (other than possibly positioning). It's the other 56 batted balls where there's some kind of fielder skill involved. 

More to come...

(1) Comments • 2019/01/28 • Fielding Statcast

Friday, January 18, 2019

Shift v NoShift by team

?I highlighted this terrific research on Twitter two weeks ago, but my comments there are ephemeral, and this research really should get the exposure it deserves.

Also when you look at pitchers you should control for that too, similar to batters. Verlander for example is not shifted the same amount as the other Astros pitchers.And limiting it to bases empty is a good idea. It's very controlled environment, and 57% of PA occur with bases empty, so we won't suffer from lack of sample size. 

Every layer you peel, you will find two more layers underneath. Lots of good stuff to uncover, keep going! If ever you get 2 more questions for each answer you get, then you are on the right path. Once you get to the point that you have no more questions, then that's a sign you hit a dead end.

Saturday, January 05, 2019

Runs on the Knight’s Watch

A continuation of a conversation from Twitter.  Read that first.  Please.  Pretty please with a cherry on top.

***

This is what is perplexing the saber community when it comes to separating fielding from pitching: we can identify WHO is there, but we can't assign RESPONSIBILITY well enough.  You start with simply ONE game.  You have a perfect game, and so is 4 runs better than average and 5 runs better than replacement.  But is the pitcher responsible for ALL of it?  We've watched enough baseball to appreciate that there's alot of randomness.  So, are perfect games usually 3 runs or 2 runs better than average for a pitcher?  And are they 1 or 2 runs better than average for fielders?  And how much to pure randomness?  0? 1?  4?

So that randomness, while starts to wash away over a season, doesn't completely wash away.

Jack Kralick in 1961 has this split with bases empty and runners on,respectively:

.292/.341/.429

.253/.297/.358

The OPS of those number is 14% higher than league with bases empty and 22% lower than league with runners on.  And the Leverage Index with runners on is 2x that of bases empty.

So you have a pitcher that is substantially better... correction... a pitcher who has been ASSIGNED a performance record substantially better when it counts the most.  And this explains why, when he's on the mound, he has among the lowest RA/9 in the league.

Do we want to credit Kralick with being on the mound getting better results with men on base, thereby limiting the impact of guys who got on base?

In other  words: do we care about sequencing?

Or, do we prefer a "seasonal component" ERA, one that ASSUMES all performance is random in terms of the base-out state?

This was in effect "clutch pitching".  Or "clutch results".  And if we are trying to account for 101 runs allowed, and not the 110 or 120 (or  whatever it is) that randomness would expect, then someone has to absorb that good result.  

And you either give it to Kralick  and/or his fielders and/or create a "timing-Kralick" bucket that acknowledges there was some 10 or whatever runs that were earned "on the knight's watch", but we don't know what to do with it.

Bill's methods are all about accounting for all those runs.  So, we have to account for them, somewhere. 

***

Fangraphs takes  a polar opposite view, and assumes randomness of events, and ONLY targetting BB, SO, HR, HBP of a pitcher.  The rest are essentially assigned to fielders and/or timing.

***

The true answer is somewhere in-between and since I know that we'll never come to consensus, I simply take a 50/50  approach of rWAR and fWAR and call it a day.

My Game Score v2 is in fact (a simplification of) that middle ground.

(1) Comments • 2019/01/05 • Fielding Pitchers

Tuesday, December 11, 2018

How to look at Statcast fielding data on Savant using Andrew McCutchen

?Andrew McCutchen is one of our sample players when I was developing Catch Probability. Him and Billy Hamilton were our goto guys.

First thing you want to do is figure out how fast of a runner he is. And Cutch is pretty fast. At 28.7 feet/second, he's 77th out of 549, or at the 86th percentile. You can also go to his running page, and see he's close to there every year since 2015. With that information, we can go to his fielding page, where we have I think the best fielding chart around.

Notice the axis, you have time on the y-axis, in seconds. And distance on the x-axis in feet. In other words, the slope of a line drawn on a distance-time graph will represent speed. And we can therefore superimpose his 28.7 feet/sec speed onto this chart. You see all those gray dots below the redline? Those are all the balls that were uncaught. Which makes sense: even with his speed, he can't get to those. Some guys COULD if they get a better jump, but Cutch is not one of those guys. That's ok to some degree. As long as he gets the balls above the red line. And there are alot of them uncaught there. That's the more concerning part. Alot of the uncaughts are short flyballs, which you can see at under 40 feet and under 4 seconds. Those are reaction plays or confidence plays. But there are others as well that are uncaught.And overall, Cutch was near the bottom, at minus 11 outs above average, with only two 4+ star catches. Since 2016, he's at minus 26 outs above average.

?

If you want to see what a superlative chart looks like, check out Ender Inciarte in 2018 or Byron Buxton in 2017

(5) Comments • 2018/12/13 • Fielding Statcast

Thursday, November 22, 2018

Stacast Lab: The Good, The Bad, and The Relevant of Outfield Fielding

?These outfield charts are my favorite. I call them SpeedLine charts. You can see them on Savant. (I added the red and orange lines. You'll see why in a second.) Here's Harper and Inciarte.

?

(Click to make bigger.)

So, what is it that we see here? First look at the axis, which is time and distance. You can skip the next paragraph if you are math averse, but then you will have to trust me if you do. Please don't skip it. I will make it as appealing as I can.

Math Interlude: "Rise over run". Do you remember that in math class? It simply means that if you look at any sloped straight line, you can pick any two points, and the ratio of the amount of rise (going up the y-axis) to the amount of run (going across the x-axis) will be CONSTANT. The value of this sloped line is what we call.. the slope. And the UNITS of this slope is simply whatever the units of the rise is (in this case seconds) over the units of the run (in this case feet). The red line you see has a rise of 5 seconds, and a run of 140 feet (which is a ratio of 1 to 28). Or if you focus on a one second segment of rise (say from 3 seconds to 4 seconds), you have a rise of 1 second and a run from about 42 feet to 70 feet, or 28 feet of run. Hence, the slope of this line is 1 second per 28 feet, which we'll call 28 feet per second. The orange line is ALSO the same slope, and so has the same rise/run. Any line parallel to the red line represents 28 feet per second.

The Good... Inciarte

As we know, the slope of a distance-time chart is speed. When Inciarte always runs at 28 feet per second, and always gets a great jump, this is represented by the red line. And so, if he has more time, or less distance than needed, he'll get to the ball (running at full speed with a great jump). If the ball is not in the air long enough and/or the ball is hit farther than he can get to it, then he won't get to the ball, no matter how much he tries. It is not humanly possible for him. EVERY SINGLE BALL below the red line is uncaught. Those gray dots you see? Those are balls that are outside of his human limits.

The orange line represents Inciarte's 28 feet per second Sprint Speed, except with an ordinary jump (that's why the intercept point is at 2 seconds, whereas the red one is at 1.5 seconds). Except for one ball, every single ball above the orange line Inciarte caught.

So what have we learned about Inciarte so far? When we plot all the batted balls on a feet v seconds chart, we can superimpose a slope based on his Sprint Speed (of 28 feet per second), setting the intercept at either 1.5 seconds (to represent a great jump) or 2.0 seconds (to represent an ordinary jump). And by doing that, we can isolate all the easy-for-him plays and all the impossible-for-him plays.

In-between is the fun, and we can see that he catches most of those. You can tell by the orange-colored dots representing catches far outnumber the gray dots which are uncaught batted balls. This disproportionate ratio means that he gets better jumps than ordinary.

The Bad... Harper

Inciarte is slightly faster than Harper, but pretty close. So we can use the same slope line for Harper. We can see this proved out that all the dots below the red line are gray. These are the impossible-for-him plays and they are in fact uncaught. Now check out the dots above the orange line. There's a smattering of gray dots, uncaught balls that are catchable. Remember all these balls would be caught if he had an ordinary jump and he ran at his personal speed. And even then, he is missing several. Finally, the in-between plays, those between ordinary effort and all-out effort. Whereas Inciarte had mostly orange to gray balls, Harper is reversed, and he's got alot more uncaught than caught balls.

Why is Harper not getting to them? Put simply: Inciarte is one of the best, if not THE best fielding outfielder in baseball, even though, he's got average speed for an outfielder. Inciarte gets good jumps, good routes, and he applies his speed. What does that mean to apply speed? It means that he doesn't pull-back. He's fearless. Darin Erstad was like that too. When you couple fearless play with terrific instincts, even with barely above average speed, this is enough to be a Gold Glove outfielder. Harper, as we saw in this terrific article by Mike Petriello, does not have anything close to those same instincts. And this is why in Outs Above Average using Catch Probability, Inciarte is +21 and Harper is -12. There's a 33 play gap here, and you can see them by focusing on the gray dots above the orange line, and those between the red and orange lines

And the Relevant... Feet/Second

And this is why we present the chart in feet and seconds, and this is why we present speed in feet per second. It is totally relevant to how players play, how we see the players play, and how we evaluate the play of the players. You create metrics by making it relevant to what it is that you are measuring. Everything about fielding is about feet and seconds. Presenting running speed as MPH is to totally miss the point of relevance. MPH is a dead end. In order for me to take the unrelatable-to-fielding 19MPH and make it relevant, I'd have to first convert to feet per second, which would then allow me to superimpose his speed on the extremely appealing and relevant distance-time SpeedLine charts we see. And so, we ignore the deadend MPH, and rely on the relevant unit of feet per second. And that's why when you create a metric, you make it relevant to the thing you are actually seeing and evaluating. You make a metric relevant by relating it to the thing you actually care about.

Tuesday, September 11, 2018

2018 Fans Scouting Report, at Fangraphs

?Thanks to the generosity of David and his team at Fangraphs, they are continuing to host the Fans Scouting Report, now in its 16th consecutive season!

Help me, help you, help everyone else, and vote for your team:

http://www.fangraphs.com/fanscouting

Monday, September 10, 2018

Statcast Lab: Naive Models 1, 2, 3 for fielding: where you Stand

Last time, I introduced the field slices for fielders.(see below)

To recap that: Rather than rely on the official position of a fielder, we instead rely on their ROLE on the field. So if a fielder is standing at Role 6.2 (to the right side of the typical SS position), we don't care if he is officially a SS, 3B, or 2B. Or even a LF playing in the infield, who maintains his LF designation. Analytically, we care about roles, not positions.

Interlude Start

A little interlude in metric creation. You can ignore all this if you are pressed for time. I'll let you know when to come back in.  

Fifteen years ago, I was convinced I could do a better forecasting model than whatever was out there. Seemed like a math problem to solve, and from as far back as I can remember, I've loved math, and I've loved sports (baseball and hockey mostly, and football too... not basketball though... not sure why) and I've loved programming. I was basically in an ideal position to do this. And I threw everything into the kitchen sink on that. And I came out with what I thought was a great forecasting system.

Then I compared it to what was out there and... it wasn't much better, if at all. So, I went back to the drawing board, and stripped everything down to the bare essentials, which turned out to be: (a) three seasons, weighting more recent more, (b) age, (c) regression toward the mean. And that was it. Everything else, including speed, park, earlier seasons, different weighting by components, playing time change... all of it... just was marginal gains.

I then decided to introduce The Marcels, to set the benchmark of what everyone else had to beat, by using a laughably simple algorithm. Which by the way, was better than most of what was out there. The biggest achievement of The Marcels was simply to clear the floor of the bad systems, so that the good systems like Oliver, Chone, Steamer, MGL, Voros could shine. 

Interlude End

Summarizing the interlude: it's critical to start with a naive model, before we put the whole kitchen in there.

I will now introduce four Naive models to set the landscape as I develop an Infielder model in the coming weeks and months. For this blog post, I'm going to focus on outfielders, even though we have Catch Probability. Catch Probability is very much an Enhanced Model, not naive. But in my rush to present the Enhanced Model, I was never able to show the gains of the Enhanced Model over a Naive Model. I should have shown what a Naive Model looks like.

We first start off with factual information, or what we think is factual information. For every batted ball, we assign a single "responsible fielder". While you can try to have multiple fielders, the reality is that it is much cleaner to have one, and you don't gain much by trying to split. Indeed, you bring in complexities and other issues that ends up undoing whatever gains you were hoping for.

For outs and errors, it's easy enough: it's the first guy to touch it. For basehits, that's a bit tougher. First you determine if it's an infield or outfield ball, which we determine based on landing distance of 200 feet. Once you figure that out, then you assign it by slice to one of the Roles we mentioned. Whichever fielder was closest to that slice gets the basehit. (It's a bit more involved than that, but not much more.)

I'll use three outfielders in my examples going forward, based on 2017+2018 seasons

Inciarte has 911 balls assigned to him, of which 713 were caught (of those we have tracked), or 78% out rate. Hamilton caught 591 of 775, or 76%. Betts is 574/763, 75%. The outs, the numerator, is factual. The denominator, to the extent how I assigned the base hits can be considered factual, is also factual. So all we've done here is created something akin to OBP. We haven't considered the context. For OBP for example, we'd care about the park, and the opposing pitcher and maybe opposing fielders, as well as differentiating between BB and HR. But that doesn't take away from the factual record of OBP, which is a record of getting on base safe, and number of opportunities. So, what we have so far with the out rates of these three outfielders is a factual record of outs and opportunities.

What models need to do is understand CONTEXT.

Goal post

So that you will get a preview of the end game, I will show their Outs Above Average using the existing Enhanced model you see at Savant:

+40 Ender Inciarte

+30 Billy Hamilton

+26 Mookie Betts

In other words, as we see the results of the Naive Models, we can start to see how naive these models are.

Also note that the above is relative to the average OUTFIELDER. Betts is mostly a RF, while Hamilton and Inciarte are premium CF.

Naive Model 1

We simply establish the out rate by the Infielder Role (IF), Outfielder Role (OF), and Rover Role (RV). The average OF converts 69.7% of his opportunities into outs. These three are well above that average. If we apply that rate to their opportunities, we can get their "Outs Above Average".

Inciarte for example had 911 opportunities. The average outfielder would get 635 outs. Since he actually got 713, that's +78 outs for Inciarte. Note, we haven't talked about the QUALITY of those opportunities. We haven't determined if they were hit right at him, or where he was standing at the start of the play. It's just a very naive model. Repeating for the other two and we have:

+78 Inciarte

+51 Hamilton

+43 Betts

Since I've already shown you the goal posts, you can see we've got alot of context to address to get from here (Naive Model 1) to there (Enhanced Model, aka Catch Probability).

Naive Model 2

The next step is to look at each Role. First we'll just look at the main Roles, meaning 7, 8, 9, rather than also their subroles. So, 8.7, 8.1, 8.2, 8.9 (that is, gap in left-CF to gap in right-CF) will be merged into one group. It is, essentially, the same as the official position. But if Billy Hamilton play at 9.8, he counts in the 9 Role, not 8.

Betts now gets compared to a context of 68%, Inciarte to a context of 76% and Hamilton also to 76%. Is this because Betts is being compared to worse fielders, since the best fielders are in CF? This is part of it, but a small part. The gap in talent between the average RF and the average CF is more on the order of 1 or 2%, not 8%. The larger part is that CF, by and large, get alot of gimmes. Anyway, so this is where we are:

+56 Betts

+21 Inciarte

+2 Hamilton

Whoah, that's quite the reversal.

I should point out something interesting about the Enhanced Model, and these Naive Models. The Enhanced Model takes the fielder's starting position as... the starting point. In other words, if there is a skill to positioning fielders, the Enhanced Model ignores it. It implicitly assumes that this skill belongs to the team, not the fielder.

These Naive Models we have looked at however is not looking at the starting position of the fielder. So, in addition to the contexts we've discussed, we are also including the positioning skill to the outfielder as well.

Naive Model 3

This one is an extension of Model 2: in addition to the main Role, we also include the subRole. So we'll distinguish between out rates for the 8.7, 8.1, 8.2, 8.9 roles. The CF numbers aren't that interesting, but the RF numbers are. Here's the out rate for roles 9.1 (typical RF, toward the left side of thefield), 9, and 9.2 (typical RF toward the right):

9.1: 59%

9.0: 68%

9.2: 72%

So if a RF plays alot toward the CF, his out rate is going to be much lower than he plays closer to the line. This may have something to do with "zone sharing". In any case, by applying these averages to the opportunities each outfielder has made, we can now see their outs above average based on their more specific role:

+64 Betts

+21 Inciarte

+2 Hamilton

The CF don't move much, but Betts gets a jump. So there is SOME positioning skill involved here that we are accounting for, in terms of these high level slices.

Naive Model 4

The first three models were all based on their Roles, where they Stand. This naive model will now include the landing spot of the ball, meaning their Function. In effect, the Stand-and-Land Naive Model.

Here we look to see how close the ball is hit to where the fielder is standing. Note, and this is important, it's only based on the slice, the spray angle. We are not, in this naive model, considering DEPTH. So, it's not how much has has to run in or back, but simply number of degrees side to side.

+63 Betts

+23 Inciarte

+1 Hamilton

So, our first setback if you will, or more accurately, a "useless" step. Though every step is useful, since we have to know the magnitude of their impact, even if it's limited. We learned nothing new about our outfielders (at least these 3), based on the spray angle needed to cover. Since we know Catch Probability looks at number of feet to cover (as well as hang time! very VERY important), as well as the wall and the direction (back or not), then we've got a long way to go from Naive to Enhanced. We're not close to getting it at this point of our Naive models. But we'll get there.

Next Step

The more important point I want to make is that when I present the Infielder Naive Models, there's going to be alot of runway for us to get through to get to our Enhanced Model. Just like we know Betts has to get from +63 down to +26, and Hamilton will go from +1 to +30, and Inciarte from +23 to +40 by considering the necessary contexts, so will we see the same (presumably) for infielders.

More to come...

?

(4) Comments • 2018/09/10 • Fielding Statcast
Page 2 of 10 pages  < 1 2 3 4 >  Last ›

Latest...

COMMENTS

Nov 23 14:15
Layered wOBAcon

Nov 22 22:15
Cy Young Predictor 2024

Oct 28 17:25
Layered Hit Probability breakdown

Oct 15 13:42
Binomial fun: Best-of-3-all-home is equivalent to traditional Best-of-X where X is

Oct 14 14:31
NaiveWAR and VictoryShares

Oct 02 21:23
Component Run Values: TTO and BIP

Oct 02 11:06
FRV v DRS

Sep 28 22:34
Runs Above Average

Sep 16 16:46
Skenes v Webb: Illustrating Replacement Level in WAR

Sep 16 16:43
Sacrifice Steal Attempt

Sep 09 14:47
Can Wheeler win the Cy Young in 2024?

Sep 08 13:39
Small choices, big implications, in WAR

Sep 07 09:00
Why does Baseball Reference love Erick Fedde?

Sep 03 19:42
Re-Leveraging Aaron Judge

Aug 24 14:10
Science of baseball in 1957

THREADS

September 17, 2024
FRV v DRS

July 21, 2024
How to evaluate HR-saving plays, part 1 of 4: Presence

September 11, 2023
What hath wrought OAA, DRS and UZR?

February 23, 2023
Are the OAA-derived components true tools?

January 29, 2023
Discussion with Bill James on OAA and Fielding Win Shares

January 25, 2023
OAA: Lindor and his fielding performance by difficulty level

December 13, 2022
Catcher Framing: Savant v Steamer

July 07, 2022
Revenge of The Shift, part 2

May 08, 2022
Describing Catch Probability with illustrations

March 08, 2022
Statcast Lab: Throw Accuracy and Frequency on SB attempts of 2B

October 27, 2021
Statcat Lab: Measuring Fielder Positioning

October 03, 2021
Statcast Lab: Distance/Time Model to Catcher Throwing Out Runners

July 24, 2021
Statcast Lab: Why does Infield OAA work?

July 13, 2021
Statcast Lab: wOBA on Shifts RHH v LHH

July 07, 2021
Statcast Lab: Distance/Time Model to Taking/Holding Extra Base