Fielding
Fielding
Monday, August 12, 2019
?Statcast Intern Kristen Austin had a sensational presentation at Saber Seminar and you can see her presentation in its entirety here. One of the topics she briefly discussed was Abandoned Rates. I asked her to expand upon it in a blog post, and so she has this PDF that she is sharing with us.
(8)
Comments
• 2019/08/16
•
Fielding
Monday, June 17, 2019
?How do I create a metric, and more specifically, how did the Jump metric come about? There is alot of art and science to the process of metric creation. For the pure artists, sorry, but we need some science. For the pure scientists, sorry, but we need some art.
What I am always trying to do, is organize, classify, categorize the data. We do this so we can actually speak the data. For example, we can create a function of exit speed to create a "hardest hitter". That function would likely be a quadratic function of some sort. Or, we can say "batted ball hit at least 95 mph". As much as the scientists want that function, and as much as I want it as well which you can see it here, it's too hard to speak that. 95+ is ubiquitous. And, just as important, it's an excellent proxy for that function. If we can speak it, with little loss of accuracy, then speak it. In other occasions, I can't do it, and so, I go all-in on creating a function (or series of functions), which is what Catch Probability is. Though even there, I try to come up with a shorthand, such as each foot affects the Catch Probability by 4%.
For Jump specifically, the primary decision is whether to represent the unit in time or distance. Do we want to show that Kiermaier is a certain number of seconds quicker than average, or a certain number of feet quicker than average. And by seconds, I mean, tenths of seconds. As I tried both ways, it become clear, I had to represent it in feet. No one can appreciate what 0.1 or 0.2 seconds means. Everyone can appreciate what 3 feet means. If a player JUST misses a catch, we don't say "he missed it by 0.1 seconds". We DO say "he missed it by a step" (or 3 feet). We can freeze a play and see that distance, but not that time. Anyway, so it become clear that the result had to be in feet.
Once that decision is made, then the other choice is a given: the selection must be made in seconds. In other words, if the unit you create is expressed by time, then the data must be partitioned by distance. And if your unit is expressed in distance, then partition the data by time. This is critical. If you don't see it, you will when you create your own metrics.
Knowing that time is the partition, now we need to select thresholds. We do this because we need to organize, classify, categorize the data. Virtually all catches are made with 3+ seconds from pitch release to catch. This becomes my first point of reference: let's focus on Jump solely based on performance in the first (up to) 3 seconds. It might have been 2 or 2.5 or 2.8. As I tried different ways, 3 seconds became the threshold.
The next thing is what we mean by Jump. And we actually had a few components. After many discussions with the rest of the Statcast team, principally Mike, Jason, Travis, Cory, Matt, we finally settled on three: Reaction, Route, Burst.
It was especially with discussions with Mike that cemented the process. We had a few discussions on whether going "the right way" is needed for Reaction and Burst. Once we decided that Route would encapsulate going "the right way", the other two pieces fell into place quickly.
Burst was interesting because at the same time, Travis was working on speed components for batter-runners, other than Sprint Speed. And since Sprint Speed uses the same scale, and can be compared between batter-runner, runners-on-base, fielders, it was highly desirable, if not necessary, that the same applies for Burst. We quickly settled on 1.5 seconds as the time window for Burst, for batter-runner. And given that I had already established 3 seconds for the Jump window, chopping that into two windows, of 0 to 1.5 (Reaction) and 1.5 to 3.0 (Burst), came into being very quickly. In addition, the Burst Distances for fielders at 1.5 to 3.0 is similar to the Burst Distances for time threshold for batter-runner that we chose. It all came into place.
Reaction was purely distance travelled in the first 1.5 seconds, regardless of direction. Burst was the next 1.5 seconds, also regardless of direction. Route was the bridging metric that was the difference between distance travelled and distance covered. And therefore, Jump is the total distance covered (not travelled) in the first 3 seconds, in the correct direction.
Now, just because all of this came into place and seemed to make sense wasn't enough. We need the metric to actually represent something about the fielder. Once we saw Jackie Bradley Jr being on the leaderboards with both quick reaction and indirect route, year after year, we knew we had it. And then seeing the results of other players, the very strong correlation year to year, it all came into place.
The last step was actually the longest: productionize the metric. We had to get this into the pipeline for our various endpoints. We had to get Daren to add his magic with Savant to take what is essentially tabular data and make it resonate with the fans. Mike had to do all the research to come up with a sabermetric staple of an article, one that is both relevant and timeless.
Anyway, so that's the process for metric creation in general, and for Jump in particular.
Wednesday, May 15, 2019
?One of the team members was asking me how is it possible that the wall and/or going back can have such a dramatic effect on Catch Probability. And he showed me an example, which was a pretty dramatically different number. There are four main variables for Catch Probability:
- How far does the fielder have to run from his starting point to the (eventual) landing point?
- How much time does he have to get there?
- Does he need to run back?
- Is the wall an impediment to making the play?
For this illustration, I will show you the actual results, as well as the estimated catch probability, for plays where the fielder has to run 80 to 90 feet, with an opportunity time (pitch release to landing) of 4.5 to 5.0 seconds, with the 4 combinations of wall and/or back.
??
To read the first line: we have 1101 plays since 2016 where the fielder had to cover 80 to 90 feet in 4.5 to 5.0 seconds, where he did not have to run back, nor was the wall an impediment. The Estimated Catch Probability was 54%, while the actual catch rate under those conditions was 55%. The last line shows that the outfielder had to run back and that the wall was an impediment. Under those conditions, they caught the ball 3% of the time, compared to an estimated 4%.
I used the above example because that was the test case that I was asked. The results were pretty good. Almost as good if I check similar conditions, like so:
??
This one is an extra 0.5 seconds of opportunity time to make the play. Not nearly as good, but still pretty good. Also note that those 0.5 seconds adds 30% to 60% of making the out.
The rough rule of thumb is that for plays in the sweetspot, 1 foot = 4% and 0.1 seconds = 10%. It obviously tapers off when the catch probability is closer to 0% and 100%.
Below you will find all the data plotted out.
Read More
(13)
Comments
• 2019/05/21
•
Fielding
•
Statcast
Monday, January 28, 2019
This blog post will just be about Justin Verlander, and the focus is only on the outfield. We have 217 batted balls that we either assigned to an outfielder, or was unplayable by an outfielder (either it was too far for even the best outfielder to make a play, or it hit the wall high). Of those, 167 were caught, or 77%. He benefited very slightly from his outfielders, who combined were +2 outs above average. In other words, we estimate, based on the batted balls he allowed, relative to the fielding alignment and parks they were in, for 165 of those balls to be caught. So, his xOuts (among outfield plays only, or xOutsOF if you will) is 76%.
In part 2, we'll look at all the pitchers. (I have no idea where he ranks, since I have yet to run it for anyone else.)
Here's the breakdown of the 217 batted balls:
- 34: 34 hits, all impossible to catch
- 13: 12 hits, 1 out, at under 50% catch probability (average of 15%; in other words, we would have expected two to be caught, but only one was)
- 10: 1 hit, 9 outs, at 50-75% catch probability (average of 66%; expect 7 to be caught but 9 were)
- 15: 2 hits, 13 outs, at 75-90% catch prob (average of 86%; expect 13, and 13 caught )
- 18: 1 hit, 17 outs, at 90-99% catch prob (average of 95%, expect 17, and 17 caught)
- 127: 0 hits, 127 outs, at 99%+ (average of 99.5%, expect 126, and 127 caught)
All in all, we see that 127 were pure gimmes, and 34 were pure auto-hits (gimme outs and takey hits). In other words, Verlander managed to get 161 of 217, or 74%, of the batted balls to not involve any fielder skill (other than possibly positioning). It's the other 56 batted balls where there's some kind of fielder skill involved.
More to come...
Friday, January 18, 2019
?I highlighted this terrific research on Twitter two weeks ago, but my comments there are ephemeral, and this research really should get the exposure it deserves.
Also when you look at pitchers you should control for that too, similar to batters. Verlander for example is not shifted the same amount as the other Astros pitchers.And limiting it to bases empty is a good idea. It's very controlled environment, and 57% of PA occur with bases empty, so we won't suffer from lack of sample size.
Every layer you peel, you will find two more layers underneath. Lots of good stuff to uncover, keep going!
If ever you get 2 more questions for each answer you get, then you are on the right path. Once you get to the point that you have no more questions, then that's a sign you hit a dead end.
Saturday, January 05, 2019
A continuation of a conversation from Twitter. Read that first. Please. Pretty please with a cherry on top.
***
This is what is perplexing the saber community when it comes to separating fielding from pitching: we can identify WHO is there, but we can't assign RESPONSIBILITY well enough. You start with simply ONE game. You have a perfect game, and so is 4 runs better than average and 5 runs better than replacement. But is the pitcher responsible for ALL of it? We've watched enough baseball to appreciate that there's alot of randomness. So, are perfect games usually 3 runs or 2 runs better than average for a pitcher? And are they 1 or 2 runs better than average for fielders? And how much to pure randomness? 0? 1? 4?
So that randomness, while starts to wash away over a season, doesn't completely wash away.
Jack Kralick in 1961 has this split with bases empty and runners on,respectively:
.292/.341/.429
.253/.297/.358
The OPS of those number is 14% higher than league with bases empty and 22% lower than league with runners on. And the Leverage Index with runners on is 2x that of bases empty.
So you have a pitcher that is substantially better... correction... a pitcher who has been ASSIGNED a performance record substantially better when it counts the most. And this explains why, when he's on the mound, he has among the lowest RA/9 in the league.
Do we want to credit Kralick with being on the mound getting better results with men on base, thereby limiting the impact of guys who got on base?
In other words: do we care about sequencing?
Or, do we prefer a "seasonal component" ERA, one that ASSUMES all performance is random in terms of the base-out state?
This was in effect "clutch pitching". Or "clutch results". And if we are trying to account for 101 runs allowed, and not the 110 or 120 (or whatever it is) that randomness would expect, then someone has to absorb that good result.
And you either give it to Kralick and/or his fielders and/or create a "timing-Kralick" bucket that acknowledges there was some 10 or whatever runs that were earned "on the knight's watch", but we don't know what to do with it.
Bill's methods are all about accounting for all those runs. So, we have to account for them, somewhere.
***
Fangraphs takes a polar opposite view, and assumes randomness of events, and ONLY targetting BB, SO, HR, HBP of a pitcher. The rest are essentially assigned to fielders and/or timing.
***
The true answer is somewhere in-between and since I know that we'll never come to consensus, I simply take a 50/50 approach of rWAR and fWAR and call it a day.
My Game Score v2 is in fact (a simplification of) that middle ground.
Tuesday, December 11, 2018
?Andrew McCutchen is one of our sample players when I was developing Catch Probability. Him and Billy Hamilton were our goto guys.
First thing you want to do is figure out how fast of a runner he is. And Cutch is pretty fast. At 28.7 feet/second, he's 77th out of 549, or at the 86th percentile. You can also go to his running page, and see he's close to there every year since 2015. With that information, we can go to his fielding page, where we have I think the best fielding chart around.
Notice the axis, you have time on the y-axis, in seconds. And distance on the x-axis in feet. In other words, the slope of a line drawn on a distance-time graph will represent speed. And we can therefore superimpose his 28.7 feet/sec speed onto this chart. You see all those gray dots below the redline? Those are all the balls that were uncaught. Which makes sense: even with his speed, he can't get to those. Some guys COULD if they get a better jump, but Cutch is not one of those guys. That's ok to some degree. As long as he gets the balls above the red line. And there are alot of them uncaught there. That's the more concerning part. Alot of the uncaughts are short flyballs, which you can see at under 40 feet and under 4 seconds. Those are reaction plays or confidence plays. But there are others as well that are uncaught.And overall, Cutch was near the bottom, at minus 11 outs above average, with only two 4+ star catches. Since 2016, he's at minus 26 outs above average.
?
If you want to see what a superlative chart looks like, check out
Ender Inciarte in 2018 or
Byron Buxton in 2017.
Thursday, November 22, 2018
?These outfield charts are my favorite. I call them SpeedLine charts. You can see them on Savant. (I added the red and orange lines. You'll see why in a second.) Here's Harper and Inciarte.
?
(Click to make bigger.)
So, what is it that we see here? First look at the axis, which is time and distance. You can skip the next paragraph if you are math averse, but then you will have to trust me if you do. Please don't skip it. I will make it as appealing as I can.
Math Interlude: "Rise over run". Do you remember that in math class? It simply means that if you look at any sloped straight line, you can pick any two points, and the ratio of the amount of rise (going up the y-axis) to the amount of run (going across the x-axis) will be CONSTANT. The value of this sloped line is what we call.. the slope. And the UNITS of this slope is simply whatever the units of the rise is (in this case seconds) over the units of the run (in this case feet). The red line you see has a rise of 5 seconds, and a run of 140 feet (which is a ratio of 1 to 28). Or if you focus on a one second segment of rise (say from 3 seconds to 4 seconds), you have a rise of 1 second and a run from about 42 feet to 70 feet, or 28 feet of run. Hence, the slope of this line is 1 second per 28 feet, which we'll call 28 feet per second. The orange line is ALSO the same slope, and so has the same rise/run. Any line parallel to the red line represents 28 feet per second.
The Good... Inciarte
As we know, the slope of a distance-time chart is speed. When Inciarte always runs at 28 feet per second, and always gets a great jump, this is represented by the red line. And so, if he has more time, or less distance than needed, he'll get to the ball (running at full speed with a great jump). If the ball is not in the air long enough and/or the ball is hit farther than he can get to it, then he won't get to the ball, no matter how much he tries. It is not humanly possible for him. EVERY SINGLE BALL below the red line is uncaught. Those gray dots you see? Those are balls that are outside of his human limits.
The orange line represents Inciarte's 28 feet per second Sprint Speed, except with an ordinary jump (that's why the intercept point is at 2 seconds, whereas the red one is at 1.5 seconds). Except for one ball, every single ball above the orange line Inciarte caught.
So what have we learned about Inciarte so far? When we plot all the batted balls on a feet v seconds chart, we can superimpose a slope based on his Sprint Speed (of 28 feet per second), setting the intercept at either 1.5 seconds (to represent a great jump) or 2.0 seconds (to represent an ordinary jump). And by doing that, we can isolate all the easy-for-him plays and all the impossible-for-him plays.
In-between is the fun, and we can see that he catches most of those. You can tell by the orange-colored dots representing catches far outnumber the gray dots which are uncaught batted balls. This disproportionate ratio means that he gets better jumps than ordinary.
The Bad... Harper
Inciarte is slightly faster than Harper, but pretty close. So we can use the same slope line for Harper. We can see this proved out that all the dots below the red line are gray. These are the impossible-for-him plays and they are in fact uncaught. Now check out the dots above the orange line. There's a smattering of gray dots, uncaught balls that are catchable. Remember all these balls would be caught if he had an ordinary jump and he ran at his personal speed. And even then, he is missing several. Finally, the in-between plays, those between ordinary effort and all-out effort. Whereas Inciarte had mostly orange to gray balls, Harper is reversed, and he's got alot more uncaught than caught balls.
Why is Harper not getting to them? Put simply: Inciarte is one of the best, if not THE best fielding outfielder in baseball, even though, he's got average speed for an outfielder. Inciarte gets good jumps, good routes, and he applies his speed. What does that mean to apply speed? It means that he doesn't pull-back. He's fearless. Darin Erstad was like that too. When you couple fearless play with terrific instincts, even with barely above average speed, this is enough to be a Gold Glove outfielder. Harper, as we saw in this terrific article by Mike Petriello, does not have anything close to those same instincts. And this is why in Outs Above Average using Catch Probability, Inciarte is +21 and Harper is -12. There's a 33 play gap here, and you can see them by focusing on the gray dots above the orange line, and those between the red and orange lines
And the Relevant... Feet/Second
And this is why we present the chart in feet and seconds, and this is why we present speed in feet per second. It is totally relevant to how players play, how we see the players play, and how we evaluate the play of the players. You create metrics by making it relevant to what it is that you are measuring. Everything about fielding is about feet and seconds. Presenting running speed as MPH is to totally miss the point of relevance. MPH is a dead end. In order for me to take the unrelatable-to-fielding 19MPH and make it relevant, I'd have to first convert to feet per second, which would then allow me to superimpose his speed on the extremely appealing and relevant distance-time SpeedLine charts we see. And so, we ignore the deadend MPH, and rely on the relevant unit of feet per second. And that's why when you create a metric, you make it relevant to the thing you are actually seeing and evaluating. You make a metric relevant by relating it to the thing you actually care about.
Tuesday, September 11, 2018
?Thanks to the generosity of David and his team at Fangraphs, they are continuing to host the Fans Scouting Report, now in its 16th consecutive season!
Help me, help you, help everyone else, and vote for your team:
http://www.fangraphs.com/fanscouting
Monday, September 10, 2018
Last time, I introduced the field slices for fielders.(see below)
To recap that: Rather than rely on the official position of a fielder, we instead rely on their ROLE on the field. So if a fielder is standing at Role 6.2 (to the right side of the typical SS position), we don't care if he is officially a SS, 3B, or 2B. Or even a LF playing in the infield, who maintains his LF designation. Analytically, we care about roles, not positions.
Interlude Start
A little interlude in metric creation. You can ignore all this if you are pressed for time. I'll let you know when to come back in.
Fifteen years ago, I was convinced I could do a better forecasting model than whatever was out there. Seemed like a math problem to solve, and from as far back as I can remember, I've loved math, and I've loved sports (baseball and hockey mostly, and football too... not basketball though... not sure why) and I've loved programming. I was basically in an ideal position to do this. And I threw everything into the kitchen sink on that. And I came out with what I thought was a great forecasting system.
Then I compared it to what was out there and... it wasn't much better, if at all. So, I went back to the drawing board, and stripped everything down to the bare essentials, which turned out to be: (a) three seasons, weighting more recent more, (b) age, (c) regression toward the mean. And that was it. Everything else, including speed, park, earlier seasons, different weighting by components, playing time change... all of it... just was marginal gains.
I then decided to introduce The Marcels, to set the benchmark of what everyone else had to beat, by using a laughably simple algorithm. Which by the way, was better than most of what was out there. The biggest achievement of The Marcels was simply to clear the floor of the bad systems, so that the good systems like Oliver, Chone, Steamer, MGL, Voros could shine.
Interlude End
Summarizing the interlude: it's critical to start with a naive model, before we put the whole kitchen in there.
I will now introduce four Naive models to set the landscape as I develop an Infielder model in the coming weeks and months. For this blog post, I'm going to focus on outfielders, even though we have Catch Probability. Catch Probability is very much an Enhanced Model, not naive. But in my rush to present the Enhanced Model, I was never able to show the gains of the Enhanced Model over a Naive Model. I should have shown what a Naive Model looks like.
We first start off with factual information, or what we think is factual information. For every batted ball, we assign a single "responsible fielder". While you can try to have multiple fielders, the reality is that it is much cleaner to have one, and you don't gain much by trying to split. Indeed, you bring in complexities and other issues that ends up undoing whatever gains you were hoping for.
For outs and errors, it's easy enough: it's the first guy to touch it. For basehits, that's a bit tougher. First you determine if it's an infield or outfield ball, which we determine based on landing distance of 200 feet. Once you figure that out, then you assign it by slice to one of the Roles we mentioned. Whichever fielder was closest to that slice gets the basehit. (It's a bit more involved than that, but not much more.)
I'll use three outfielders in my examples going forward, based on 2017+2018 seasons.
Inciarte has 911 balls assigned to him, of which 713 were caught (of those we have tracked), or 78% out rate. Hamilton caught 591 of 775, or 76%. Betts is 574/763, 75%. The outs, the numerator, is factual. The denominator, to the extent how I assigned the base hits can be considered factual, is also factual. So all we've done here is created something akin to OBP. We haven't considered the context. For OBP for example, we'd care about the park, and the opposing pitcher and maybe opposing fielders, as well as differentiating between BB and HR. But that doesn't take away from the factual record of OBP, which is a record of getting on base safe, and number of opportunities. So, what we have so far with the out rates of these three outfielders is a factual record of outs and opportunities.
What models need to do is understand CONTEXT.
Goal post
So that you will get a preview of the end game, I will show their Outs Above Average using the existing Enhanced model you see at Savant:
+40 Ender Inciarte
+30 Billy Hamilton
+26 Mookie Betts
In other words, as we see the results of the Naive Models, we can start to see how naive these models are.
Also note that the above is relative to the average OUTFIELDER. Betts is mostly a RF, while Hamilton and Inciarte are premium CF.
Naive Model 1
We simply establish the out rate by the Infielder Role (IF), Outfielder Role (OF), and Rover Role (RV). The average OF converts 69.7% of his opportunities into outs. These three are well above that average. If we apply that rate to their opportunities, we can get their "Outs Above Average".
Inciarte for example had 911 opportunities. The average outfielder would get 635 outs. Since he actually got 713, that's +78 outs for Inciarte. Note, we haven't talked about the QUALITY of those opportunities. We haven't determined if they were hit right at him, or where he was standing at the start of the play. It's just a very naive model. Repeating for the other two and we have:
+78 Inciarte
+51 Hamilton
+43 Betts
Since I've already shown you the goal posts, you can see we've got alot of context to address to get from here (Naive Model 1) to there (Enhanced Model, aka Catch Probability).
Naive Model 2
The next step is to look at each Role. First we'll just look at the main Roles, meaning 7, 8, 9, rather than also their subroles. So, 8.7, 8.1, 8.2, 8.9 (that is, gap in left-CF to gap in right-CF) will be merged into one group. It is, essentially, the same as the official position. But if Billy Hamilton play at 9.8, he counts in the 9 Role, not 8.
Betts now gets compared to a context of 68%, Inciarte to a context of 76% and Hamilton also to 76%. Is this because Betts is being compared to worse fielders, since the best fielders are in CF? This is part of it, but a small part. The gap in talent between the average RF and the average CF is more on the order of 1 or 2%, not 8%. The larger part is that CF, by and large, get alot of gimmes. Anyway, so this is where we are:
+56 Betts
+21 Inciarte
+2 Hamilton
Whoah, that's quite the reversal.
I should point out something interesting about the Enhanced Model, and these Naive Models. The Enhanced Model takes the fielder's starting position as... the starting point. In other words, if there is a skill to positioning fielders, the Enhanced Model ignores it. It implicitly assumes that this skill belongs to the team, not the fielder.
These Naive Models we have looked at however is not looking at the starting position of the fielder. So, in addition to the contexts we've discussed, we are also including the positioning skill to the outfielder as well.
Naive Model 3
This one is an extension of Model 2: in addition to the main Role, we also include the subRole. So we'll distinguish between out rates for the 8.7, 8.1, 8.2, 8.9 roles. The CF numbers aren't that interesting, but the RF numbers are. Here's the out rate for roles 9.1 (typical RF, toward the left side of thefield), 9, and 9.2 (typical RF toward the right):
9.1: 59%
9.0: 68%
9.2: 72%
So if a RF plays alot toward the CF, his out rate is going to be much lower than he plays closer to the line. This may have something to do with "zone sharing". In any case, by applying these averages to the opportunities each outfielder has made, we can now see their outs above average based on their more specific role:
+64 Betts
+21 Inciarte
+2 Hamilton
The CF don't move much, but Betts gets a jump. So there is SOME positioning skill involved here that we are accounting for, in terms of these high level slices.
Naive Model 4
The first three models were all based on their Roles, where they Stand. This naive model will now include the landing spot of the ball, meaning their Function. In effect, the Stand-and-Land Naive Model.
Here we look to see how close the ball is hit to where the fielder is standing. Note, and this is important, it's only based on the slice, the spray angle. We are not, in this naive model, considering DEPTH. So, it's not how much has has to run in or back, but simply number of degrees side to side.
+63 Betts
+23 Inciarte
+1 Hamilton
So, our first setback if you will, or more accurately, a "useless" step. Though every step is useful, since we have to know the magnitude of their impact, even if it's limited. We learned nothing new about our outfielders (at least these 3), based on the spray angle needed to cover. Since we know Catch Probability looks at number of feet to cover (as well as hang time! very VERY important), as well as the wall and the direction (back or not), then we've got a long way to go from Naive to Enhanced. We're not close to getting it at this point of our Naive models. But we'll get there.
Next Step
The more important point I want to make is that when I present the Infielder Naive Models, there's going to be alot of runway for us to get through to get to our Enhanced Model. Just like we know Betts has to get from +63 down to +26, and Hamilton will go from +1 to +30, and Inciarte from +23 to +40 by considering the necessary contexts, so will we see the same (presumably) for infielders.
More to come...
?
Monday, August 27, 2018
?As I showed two weeks ago, we will be adding "Role" designations to fielders (in addition to maintaining their official position). It takes just a small amount of effort to understand the Fielder Roles. They are grounded in the traditional positions, with additional identifiers of ".1" (left) and ".2" (right). There are a few other slices or zones, and you can kind of figure out the scheme.
Now that we know the role for each player, for each play, and we have assigned each play to one fielder (based on the proximity of the ball to the player), we now need to know their FUNCTION, the landing point of the ball. It's going to be more exciting when I show it for infielders, but let me describe it for outfielders, since it's easier.
(Click to see larger image.)
?
For a fielder who plays in the traditional LF spot (role 7.0), and the ball lands in that same area (landing 7.0), 76.6% of those plays are converted into outs. And if you look at landing 7.1, meaning the ball lands in the slice that is toward the LEFT side of the field, they converted 72.9% of those plays into outs. For balls that land toward the right side of the field (landing 7.2... and remember we are still looking at role 7.0), they convert 70.6% of those plays into outs.
If you look at each of the green boxes, you will see that balls that land in the same slice as the fielder is standing, the conversion rate is highest, and the more the ball lands away from that slice, the fewer outs are made per play. (For the most part anyway.) This is fairly obvious in terms of DIRECTION. What we see here is the MAGNITUDE that this is true.
I'll let you digest this for a bit, then I'll post the infielder data.
Sunday, August 26, 2018
?A recap of outfielders, infielders, and catchers can be found in part 4 with links to the other 3 parts.
Now the DH. Remember that we've introduced TWO types of positional adjustments: one for defense and one for offense.
- Intra-outfield, we only have defensive adjustments.
- Intra-infield (2b, ss, 3b), we also only have defensive adjustments.
- Inter-IF-OF, it's still a defensive adjustment, but based on the offensive value of their replacement pool.
- Catcher has two distinct adjustments, one for defense and one for offense.
DH is very similar to catcher. It is much harder to hit as a DH than as a non-DH. So, we need to apply an offensive adjustment.
Now, about defense. Obviously, a DH does not "defend" anything. But you have to do... SOMETHING. Take for example Frank Thomas, who has some of the worst stats as a fielder for 1B. Let's say he is -10 runs per season as a fielder for 1B. In the years as a DH, he obvious didn't help or hurt. Or did he? He is taking up a spot as DH and so forcing his team to deploy a 1B. If that fielder as a 1B is -5 runs per season, that fielder has more defensive value than Frank Thomas does.
In the end, the best way to make sense of this is to look at it from the point of view of "defensive value" rather than "fielding value". Fielding means actually fielding. Defense is at a higher level that encompasses more than fielding. It can include pitching. And it can include how to deploy the players, such that someone is at DH.
And the position we are taking with regards to the "defensive value" of a DH is that it is equivalent to the fielding value of a poor-fielding 1B. That is, Frank Thomas, whether as a fielder or as a DH, has the same defensive value.
I know this is not necessarily the clearest of all the positional adjustments we have. However, when you create a model you try to represent reality. And the reality is best represented when the defensive value of Frank Thomas matches as 1B and as DH.
Sunday, August 19, 2018
?With outfielders, we treated the players as one big pool, without needing a positional distinction. We can directly calculate the defensive positional adjustment.
With infielders, we kept the three positions as distinct-but-related, with the three positions bridged by common players. So, we can infer the defensive positional adjustment.
In comparing the pool of outfielders to the pool of infielders, we use the replacement pool for each to value the pools the same. And therefore, the gap in offense among the replacement pool (not among the average player in those pools) is balanced by the gap in defense. And that gap is a defensive positional adjustment.
Now, the catchers. You COULD do the same for catchers. You could treat them as its own pool. You could take the replacement pool, and compare that to the replacement pool among infielders and outfielders. And you could presume the gap in offense among the replacement pool is balanced by the gap in defense. You COULD do that. Except catchers have a tougher time hitting. And so PART of the gap in offense and defense is purely attributed to this constraint catchers face. Ideally, we'd separate it out so that part of the positional adjustment for catchers is because a .350 wOBA by a catcher is not the same as a .350 wOBA by a non-catcher. That's because it's REALLY hard for a catcher to do that. There's a catcher-penalty to hitting, much like there's a SP penalty to pitching (or a RP bonus if you prefer). So, we SHOULD have an offensive positional adjustment AND a defensive positional adjustment for catchers.
In the end, it doesn't REALLY matter because it comes out in the wash. Overall, nothing changes. But in terms of isolating the offensive and defensive production, it matters a whole lot.
***
We'll talk about DH next time.
In part 1, we looked at outfielders.
In part 2, we looked at infielders.
Now we need to look at infielders compared to outfielders. Let's step away from baseball for moment and think of football. We would never presume that an average QB = average OT = average TE. There's nothing inherent about any of that. And certainly, we wouldn't make a QB play OT or TE in order to value him as a QB.
The same thing applies in life, with regards to any product or service. How do you evaluate these things? In the end, it's how much you pay for it. And if you had ten dollars, you've decided how much water and juice and salad and legal advice you will pay for, regardless of how similar the utility of those things are.
In sports, we look at the bubble players, those guys who are paid at or close to the league minimum, regardless if they are a QB or a goalie or a SS or a LF.
It is not common for a player to switch between infield and outfield. Indeed, virtually all of these position switchers is unidirectional, going from infield to outfield. See, within the infield (2B, SS, 3B) and within the outfield, those players are all part of the same pools. There is no such thing as a pool of 2B. Those guys are not only 2B, but also SS, because any player that is in the SS pool is automatically part of the 2B pool. To some extent, the guys in the 2B pool may (but not necessarily will) be part of the SS pool. So we have an infield pool.
When it comes to IF to OF comparisons, it is closer to comparing a forward and defenseman in hockey than it is to comparing a winger and center. As a result, we need to find a different bridge than we found for infielders. And that bridge is the pool of players who are just hanging on. And those players, if you look at the offense AND defense, will be equals. How do I know? Because they are all being paid the same, regardless of how much water or legal advice they provide.
And once you have established the pool of players that are equals, and once you have calculated their offensive value, the remaining value is their defensive value. And if you have the UZR of these players in the infield and in the outfield, you bridge their UZR value to their defensive value through a positional pool adjustment (infield and outfield pools).
***
Next up: catchers and DH.
Saturday, August 18, 2018
?We looked at outfielders. Let's now look at infielders, specifically 2B, SS, 3B. As we saw a bit earlier with outfielders, metrics like UZR will "force" the average 2B, SS, 3B to all be "0". However, since each of these positions are their own universe, a 0 at 2B does not mean the same thing as a 0 at SS. This is most clear in high school where it would be impossible for the better fielder to be at 2B instead of SS for any team. League-wide the avg SS would be far better than the avg 2B. But if you had a system like UZR, it would force both to have an avg of 0.
So, what to do? Unlike outfielders, each position has its own responsibilities, so we can't (easily) compare what a 0 at 2B matches at SS. We are NOT asking "what would a 0 at 2B do at SS". We are NOT asking "what would a 0 at SS do at 2B". What we are trying to do is find some common baseline to bridge the two universes. In other words: all other things equal, what would you trade a 0 at 2B for if you wanted a SS? And what would you trade a 0 at SS for if you wanted a 2B? Or, if you paid a 0 at 2B X number of dollars, what kind of fielding at SS would you need to pay the exact same X number of dollars.
And one way to get there is to find a low paid infielder who spends alot of time at both positions. Indeed, you will find dozens of such players. These guys are paid the league minimum, they spend as much time on an MLB roster as they do in the minors, they plug holes at 2B, SS, and 3B. Those players provide a common baseline. We know how much they are worth, and we know how well they compare as fielders at multiple positions. These guys are the bridge.
And so we can compare Altuve and LeMahieu and Schoop to this bridge.
And over in the SS universe, we can compare Correa and Simmons and Lindor to this same bridge.
And so, without our two universes of fulltime 2B and SS ever intersecting, we can compare those universes by a bridge of players who have the same value whether these players are playing 2B or SS. And these guys we know their fielding values at 2B and we know their fielding values at SS.
And that's how we bridge the UZR values of Altuve and Correa, without either one ever playing another position, nor do we need to entertain the idea of them playing another position.
***
Next installment, it's the catchers and DH.
?Positional adjustments do two things, maybe three(*), which we wrap into one. When we do that wrapping we lose sight of those two or three things.
(*) I'll tell you if it's two or three after I finish writing these blog posts.
Let's take the easy one. When you look at something like UZR, those metrics establish a zero-point at the positional level. In effect, each position is its own universe. You CANNOT compare the zero-point across positions. More specifically, since the average SS in UZR is 0 and the average LF in UZR is 0, these two 0 are not equal to each other, even though they both show 0. 0 <> 0. That's because one is 0 in the SS universe and the other is 0 in the LF universe.
However, in Catch Probability, LF, CF, and RF belong to the same universe. In UZR, the avg CF is 0 and the avg LF is 0 and the avg RF is 0; in no way does UZR, by itself, allow us to directly compare CF to LF to RF. Catch Probability however DOES allow us to compare the three because it doesn't treat the three as three positions, but rather one: outfield. And it compares each outfielder to ONE common standard. And so the average OF = 0. What is the average CF? This is important: in Catch Probability it does not "force" the average to be anything. We can actually let the model tell us what is an average CF. If for example, for some reason, Hamilton and Buxton and Inciarte and Kiermaier and the rest of them all became full time LF, then the average LF would end up being better than the average CF. But Catch Probability doesn't force that. It simply uses one baseline for the entire outfield. And we can then determine the quality of the avg LF, avg CF, and avg RF based on the results of the system.
As it turns out, these days, the avg CF is 5 runs ahead of RF who is 5 runs ahead of LF. But certainly in prior decades LF = RF, and potentially LF > RF in the really early days. That's not the world of today however.
Coming up: we'll talk about infielders, catchers, and DH.
Monday, August 13, 2018
?As we know, in this day and age where Javy Baez can play anywhere on the field, even pitch to pitch, the designation of "2B" and "3B" is not very helpful, analytically. And it is especially not helpful when an infielder is part of a 4-man outfield, yet maintains his infielder designation. I generate a warning report when an infielder makes a putout deep in the outfield... only to find that it might be Kris Bryant who officially maintained his 3B position. His ROLE however was quite different.
So as to maintain some semblance of continuity with the 1-9 position designation we've come to know and love, but to enhance it to make use of how the players are placed, we are working to create ROLES. This is a first pass.
Read More
Saturday, August 11, 2018
?This is just a collection of my tweets from yesterday and today. The basic point is that Nola (a) has one of the best BABIP and (b) plays with one of the worst fielding team. And so, it boils down to: how do you adjust for his context?
And more specifically: do we treat his fielders as having the expectation to play at their typical fielding level FOR THE SEASON or ON THOSE PARTICULAR TIMES WHEN NOLA IS ON THE MOUND? It's a nuanced distinction that has a very specific implication to Nola.
Here they are:
Read More
Friday, August 10, 2018
?There was a pretty fun thread last year on this issue. I suggest you spend 10-20 minutes reading all that first.
...
Back already? Ok, so just to followup with current data.
1. Is there a bias in catch prob based on CF and corners?
Using the same matching method noted in the main thread, I have 211 outfielders with a total of 11907 plays (roughly 32 162-game seasons), and these players were +19 outs above average in CF and the same players are +32 outs above average in the corners. Pro-rated down to a single season, and that works out to:
- +0.6 outs in CF
- +1.0 outs in corner
Therefore, we can conclude that catch probability is not biased by CF/corners. To the extent that it is, we are talking about less than half an out per season.
***
2. Having established that we have no positional bias in the metric, what are the outs above average (per play) by position, league-wide for all players?
- CF: +0.013 outs per play
- RF: -0.003 outs per play
- LF: -0.014 outs per play
On a seasonal basis, and applying the opportunity of each position we have:
***
3. What is that in runs and what about the arm?
You can multiply the above number by a bit over 0.8 to get the runs. You can also figure that the ARM is about +1 for RF, 0 for CF and -1 for LF. So, we'd have these presumed run values:
***
4. So what does that say about the positional adjustment?
You can make the case that this that we use in WAR:
Should be this:
Sunday, July 22, 2018
?I've been saying that the pinnacle of sabermetrics is convergence of scouting and performance analysis for 15 years. Here's one such exchange back in 2006:
I think people like to associate "numbers" and performance analysis to sabermetrics, and relegate scouting and observation as some ugly duckling. Sabermetrics is about the search for truth about baseball. And, at its core, baseball is about the physical and mental abilities of its players, which manifest themselves in explosions a handful of times in a game. Since we have limited samples in which to evaluate a player by his performance, we need to supplement that with some keen observations. The pinnacle of sabermetrics is the convergence of performance analysis and scouting.
And for the last 15 years I've been showing that belief in running the Fans Scouting Report, nicely hosted at Fangraphs, and un-nicely on my site. There's not a single name in that top 30 list that you could think "nah, way too high". And I think the reason this project works is because I broke it up into components:
?
And if you follow the headers left to right, you can see what I did: I followed the path of the ball. The blue is before contact, the green is post-contact, and the purple is post-catch. So rather than ask the fan, to aggregate 400 plays of a fielder of which they may have seen 100 of them into one number, I instead asked the fan to focus on components. Not only did the focus on components free them from the potential bias in advanced stats like UZR, it also gives us a glimpse in a player's fielding profile that we'd otherwise not get.
Until now. This is what we're working on.
?
We already have each fielder's overall performance on Savant. What we're working on is HOW. The above represents the top fielding performances on 2+ star plays. We can see the unsurprising names (especially if you've been following closely).
The metric is in terms of feet (relative to the outfield average). Why feet, instead of time? That's a good question. In constructing such a metric, or any metric, you have to approach it multiple ways to understand the benefits and costs of doing it each way. And in this case there are two avenues:
- set your thresholds based on time, and measure your metric in distance
- set your thresholds based on distance, and measure your metric in time
Both are valid, both are reasonable. There are three reasons the PRIMARY view of the metrics will be based on distance:
- The range in feet is going to be wider than in seconds, so that we don't have to go to two-decimals like we would for time. We could potentially go just with integers with feet. Would you prefer to see a range of feet of +/- 5, or a range in seconds of +/- 0.20?
- You can SEE distance. When an outfielder misses a play by "that much", we see "that much" as having missed the play by say 2 feet or 5 feet. We do not think that he missed it by 0.08 seconds or 0.20 seconds.
- Route. If you wanted to measure the indirectness of a route, would you think in terms of his indirect route added 6 feet to his run, or that it added 0.25 seconds? Especially if you tie it in to #2. His indirect route added 6 feet, he missed it by 5 feet. There's the story.
Anyway, that's what's in the lab, it's what I'm working on right now. And you can see early returns on twitter. Or keep following along in my blog. Eventually we'll encapsulate the entire play to include positioning (so that the reader can decide whether to include it for the fielder or for the team), as well as post-catch results (throwing). And as we are ready to rollout, you'll see it all on the Savant pages. More to come...
Recent comments
Older comments
Page 2 of 151 pages < 1 2 3 4 > Last ›Complete Archive – By Category
Complete Archive – By Date
FORUM TOPICS
Jul 12 15:22 MarcelsApr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref
Apr 12 09:43 What if baseball was like survivor? You are eliminated ...
Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method)
Jul 13 10:20 How to watch great past games without spoilers