Batted_Ball
Batted Ball
Monday, August 27, 2018
?As I showed two weeks ago, we will be adding "Role" designations to fielders (in addition to maintaining their official position). It takes just a small amount of effort to understand the Fielder Roles. They are grounded in the traditional positions, with additional identifiers of ".1" (left) and ".2" (right). There are a few other slices or zones, and you can kind of figure out the scheme.
Now that we know the role for each player, for each play, and we have assigned each play to one fielder (based on the proximity of the ball to the player), we now need to know their FUNCTION, the landing point of the ball. It's going to be more exciting when I show it for infielders, but let me describe it for outfielders, since it's easier.
(Click to see larger image.)
?
For a fielder who plays in the traditional LF spot (role 7.0), and the ball lands in that same area (landing 7.0), 76.6% of those plays are converted into outs. And if you look at landing 7.1, meaning the ball lands in the slice that is toward the LEFT side of the field, they converted 72.9% of those plays into outs. For balls that land toward the right side of the field (landing 7.2... and remember we are still looking at role 7.0), they convert 70.6% of those plays into outs.
If you look at each of the green boxes, you will see that balls that land in the same slice as the fielder is standing, the conversion rate is highest, and the more the ball lands away from that slice, the fewer outs are made per play. (For the most part anyway.) This is fairly obvious in terms of DIRECTION. What we see here is the MAGNITUDE that this is true.
I'll let you digest this for a bit, then I'll post the infielder data.
Tuesday, July 24, 2018
?Logistics.
First, I'll tell you the right way, or at least a good way, to make sure that xwOBA and wOBA match. I would do a rolling one year period to make sure that they are equal. This way, by the end of the year, it would be a perfect match for that season. Why not use season-to-date? Well, what would you do after one week or one month, especially if that is a very cold month? Suddenly, forcing it to match at the April-2018-only level you'd actually be incorporating temperature, which is not what you are after. Or at least, it's not what I am after. Rolling it with a one year window and all that goes away. However, what it does do is that whatever property the 2017 season gets carried over to the 2018 season. Then again, as of Apr 30, 2018, we don't have great confidence as to what the rest of 2018 has.
Anyway, that's one way, and there are other ways to make sure xwOBA = wOBA at every point. Plenty of you Straight Arrow readers I'm sure have your own solution.
Now, logistics. There are two main logistical scenarios we have. The first is that we prepare media and other things that uses xwOBA. If it changes every day, that means that every piece of data has to get re-updated every day. The second one is to synch all our endpoints including Savant. It's a long process for all the endpoints to synch up. Sometimes we are even delayed, and when Savant is not updated by 11:00 ET, we hear about it. When we run Statcast Searches, we have to make sure all that is queryable at a play by play basis, so everything has to match everywhere else. So, we have constraints to deal with.
The plan was at the all-star break, we'd revisit the issue, and see if we could synch it up. Entering the 2017 season, we used 2016 as the model. And we really undershot. Entering 2018, we used the 2017 season as the model. And now we really overshot. That 2017 season, plus the season-to-date temperature has upset our plans somewhat. We are trying to figure out WHY before we make a wholesale change and create a 2018 model.
That said, by the end of the 2018 season, we will refresh. Now it's just a matter of the logistics until then.
(2)
Comments
• 2018/07/25
•
Batted_Ball
Monday, July 02, 2018
?This weekend we rolled out a new snazzy graphic with our broadcast partners:


The idea, here anyway, is straightforward: show the batter's spray pattern over his last 100 batted balls that landed within 200 feet of home plate. As we've previously discussed, the positioning of the fielders to distinguish between what is an infielder and what is an outfielder is at 220 feet. Therefore, in order to look at the landing spot of the ball, it would have to be less than 220 feet. Following a similar process, this is how many plays initiated by infielders and outfielders based on the landing spot of the ball.
?
The criss-cross point is 211-212 feet. If we focus on the point where 80% of the plays are made by the infielder, that's right around 200 feet for the landing distance. And so, that's where we settled.
As for the number of batted balls, it was clear that early in the season, we couldn't rely on season-to-date, and so, we approached it on the basis of a rolling average. In terms of the spray tendency, the amount of regression required was fairly low, so using 100 is right where we were comfortable to showing the signal. I'll update this post in a bit with that research.
Anyway, back to the charts, they look pretty sweet as it reflects the open slice matching the low spray numbers among the five slices. In other words, we approached it on the idea that we have 4 infielders, and so the question for the manager is "where do I give up the hole?". And in those two images, you can see why the manager did what he did.
Naturally, we prefer doing one based on "true talent", and so a "spray forecast", that includes the batter and pitcher, and potentially the base-out situation. But, for broadcast purposes, simply going with "what has the batter done" was the preferred approach. As much as possible, we should stick to the facts, when the facts can tell most of the story.
Tuesday, April 03, 2018
?We have made updates to the hit probability model. The original model used only exit velocity + launch angle (EV + LA). We added two additional criteria:
- if the ball was hit with an EV + LA such that it was potentially close to the generic fence line, then we're also including the spray direction
- if the ball was topped or hit weakly (which we are treating as the batter potentially legging out a hit), then we're also including the batter's seasonal sprint speed
We're going to have a more elaborate post on this. Those who attended SABR got a sneak peek at this.
Update: this is current for 2018 season, as well as retrospectively for 2017 season. And we'll be updating 2016 and 2015 as soon as we can. Might be a few weeks.
Tuesday, December 19, 2017
?In the third of three articles that I enjoyed in the 2018 Shandler Baseball Forecaster, the author describes his Deserved HR idea. Longtime readers will think of MGL's Virtual HR, which at its core is essentially "UZR for HR". The same model is used by all sports. The real question is whether creating this metric adds value beyond just ACTUAL HR.
We see this in hockey as well, where NetShots has taken over from NetGoals in many analysts eyes, in predicting future NetGoals. (Goals, or Runs, or whatnot is what we are always after.) The volume of Shots, being over 10x to 20x that of goals is one reason why it excites analysts. However, by the time the number of games is large enough, the extra non-Goal Shots adds very little. So, the non-goal shots is a leading indicator... until you have enough goals... at which point those extra non-goal shots become a coincident indicator.(*)
(*) Term used by Tom Awad.
The same question would apply here. Based on the article, the results look promising, that two years of Actual HR is equivalent to one year of Deserved HR. You can see MGL's tweets from last night along with my followups, as for why we want Deserved HR as one component, but why we also don't want to go too far.
Therefore, what I'd like to see in future analysis for the aspiring saberists, is this:
- Create three pools of batted balls: (a) those that are both Actual and Deserved (b) those that are Actual but not Deserved (i.e., lucky) and (c) those that are Deserved by not Actual (i.e., unlucky)
- Compare the three pools to next year's data and see how much each pool predicts next year
If this is like NetGoals and NetNonGoalShots, we should expect to see the Actual carry more weight than the non-actual to some extent, and the more years you have the more the actual should carry. In other words, those pools that we think of as "lucky" may actually not be as lucky as it's presumed. They may look lucky based on this model, but only because the model doesn't account for everything.
?In the second of three interesting articles, the author talks about the "carry" of the ball (over two separate articles). And he confirms findings I have found, and others have found years ago, that essentially 1 extra foot of carry results in 3% more HR. He also makes a good point regarding the fences being "closer": since the fences are not circular, a hitter will more likely hit a HR if he hits the ball closer to the line than if he hits it straightaway. While this is obvious, I like the description of saying that the hitter himself can "move" the fences closer. You can also think of "carry" based on the spin rpm and axis. You can think of shooting pool, where if you hit the ball too off-centered, then no matter how hard your "exit velocity" of the cue stick is, you won't get any speed along that attack angle.
Sunday, June 18, 2017
?One of the new waves of saberists out there, Andrew, does some excellent work. One of the things he's doing is taking "estimated wOBA" and applying it in place of actual outcomes. And then using that to forecast future RA/9. Except. Well, except that the best predictor, even better than FIP, is simply K minus BB per PA. That's right. Completely ignore batted balls, whether hard hit or not, by angle or not. Ignore basestealing, ignore everything. Except K and BB.
His main issue is that he basically used a chart like this:
?
And simply added up those values for every batted ball. Prima facie, this is entirely reasonable. You have a high pop up? Let's count that as close to a 0 wOBA (i.e. out). You have a 28 degree 115mph shot? Let's count that as close to a 2.000 wOBA (i.e., HR). And I would say 99% of researchers would do exactly that.
But, what if I tell you that the ENTIRETY of a player's batted ball profile can be determined by the frequency of his barrels? That is, rather than assign a value to every batted ball, let's only assign one value, the same value, to simply those 6% of the balls that falls into the "barrels" category?
You'd think I'm crazy, right? Look how strong that relationship is, looking at barrels to wOBA the following year (not wOBA on batted balls, but overall wOBA including BB and K!)
Well, Andrew just demonstrated that it's better to discard 100% of the batted balls, than to include all of them. (Voros is smiling.) And I'm saying, let's at least START by discarding 94% of the batted balls, and focus on the 6% hit at the ideal speed+angle.
Then, you can start adding a bit more. You can add the near-barrels, those well-struck balls that just missing being barrels. You can add the flares and the burners. And so on. Once you do this, then you'll be in a much better position to forecast the future.
(1)
Comments
• 2017/06/18
•
Batted_Ball
Monday, May 09, 2016
?Pirates are reeling them in.
Just to make sure this isn’t one of those instances where a team is all talk in the spring and no walk in the regular season, Mike Petriello’s got the data, and nobody’s playing a more shallow center field relative to last year than McCutchen. In 2015, McCutchen lined up 316 feet from home plate, on average. This year? He’s averaged a touch under 300 feet (data current as of May 2).
Saturday, May 07, 2016
?Everyone's doing it. And just now, Ben over at BIS through Bill's site posted a counter to Pizza's recent article.
However, there's a flaw in these analysis, one that we've been pointing out for years. And one that Bill James also pointed out in a rather lengthy post a few years ago: you cannot limit the analysis to only those times when a ball is put in play. The question is very simple: very very simple: What is the batter's overall production when he is shifted? That's the question. That's the singular simple question. And to answer that you have to (a) know when he's shifted BEFORE seeing the results and (b) all the outcomes, including BB, SO, HR.
You cannot say that because a batter BB, SO, or HR that the shift had no effect in either way. Au contraire. Given that the batter is responding to the stimulus of a new defensive alignment, EVERYTHING is on the table. How he approaches that situation is at the very heart of the question.
Therefore, every single analysis you see... every single one, without exception.... if it doesn't reference a player's overall production, and that means including BB, SO, HR production can be simple discarded as to its conclusion.
***
It also gets worse because some of these trackers are only marking a shift if they think the shift affected the result. That is, a flyball hit to the LF is not being marked as "shift" data, because the reasoning is that regardless of whether there was a shift or not, the result would have been the same. That's silly on its face, and it's silly underneath the surface. I don't know if these trackers are still doing this, but this is wrong.
Monday, May 02, 2016
?For some 15 years, I've been talking about the convergence of scouting and performance analysis. That is what sabermetrics really is. Sabermetrics is not just about "stats". Scouting information actually provides the necessary prior to evaluate the resulting performances. If you had zero history on Felix and Armando, and you were told to watch their perfect games, a scout might be able to distinguish differences that the resulting performances (0 for 27) did not. As I noted back in 2009, "The idea is to create a model that is complex and comprehensive enough as to make both performance and scouting data obsolete." That is, the convergence of scouting and performance analysis is when they both disappear. As much as it's not possible for that to be true, the goal is to work toward that end.
And William provides some excellent data points toward that end. In terms of "extrapolation", I would have used Alan's Trajectory Calculator as a (very strong) prior. It's inconceivable that you could have a 100mph 8-second popup that travelled 100 feet. The hitter would have to have intentionally gone for a huge uppercut. Setting that small issue aside (on frankly data that won't exist anyway, so it doesn't really matter how you model the parts of a system that can't happen), we get into the fun stuff:
When used in conjunction with Steamer, the impact of exit velocity was still significant, both statistically and practically. We can expect a hitter to outperform his Steamer projected wOBA by roughly three points for each mph of previous-season exit velocity. (We found a similar effect when using either ZiPS projections or an average of ZiPS and Steamer.)
Note that since they are using wOBA (which includes BB and SO), what they are forecasting is not just on future batted balls, but on all plate appearances. Which may SEEM wrong, or at least weird, but when you think about it: guys who hit harder have an effect on how a pitcher will pitch to him. You wouldn't just want to forecast his wOBA on Contact based on his exit speed, but his wOBA on ALL his plate appearances. Naturally, you could break it up so you can see wOBA on Contact and wOBA on non-contact, as well as the FREQUENCY to which he contacts his balls in the future. By looking at future wOBA, all that gets rolled into one.
We've known about the importance of hang time ever since Robert Dudek's simple, yet groundbreaking article, way back in the premier issue of Hardball Times. William continues in that tradition, but also takes advantage of known data points. This is what sabermetrics is all about.
Tuesday, April 19, 2016
Pizza gives us the results of his correlation. Note that since Pizza split his data based on time, we end up with a bias, which is most easily seen with the pitcher data. Setting that aside, I'll present his data, with one extra column:
BIP r Regression Amount
10 0.350 19
20 0.527 18
30 0.635 17
40 0.679 19
50 0.732 18
As he showed, the more number of trials you have, the higher the correlation. This is a given (setting aside systematic bias). If you had a billion trials, you'd have r almost exactly 1. That's why reporting correlation without reporting the number of trials is useless. So, good job on Pizza in showing us this progression.
The extra column I added was the Regression Amount, which is simply determined as (1-r)/r * number of trials. We hope and expect that the regression amount is constant, regardless of number of trials. And, lo and behold, it is! You basically simply add 18 batted balls at league average exit velocity, and, voila, you have an estimate of the hitter's true talent level of exit speed.
If you remember, we did something similar back in the early PITCHf/x days for a pitcher's THROWING speed. And if I remember right, the regression amount we added was... 1 pitch. That is, a pitcher's true talent throwing speed is almost instantly known.
Which of course makes sense, since there's no variable between the pitcher and his release that we really have to consider. With a batter, he has to respond to the pitcher, and he may not hit the ball squarely, or on the same plane. So, there's a gap between his base talent level and his results.
The closer what you do is to what you deliver, the less amount of regression you need. In addition, the wider the overall spread in talent in, the less amount of regression you need. When you have one guy averaging an exit speed at 90mph and another at 75mph, that's a wide spread to begin with. It's much quicker to determine that correlation than if everyone exited at 88-90mph.
Friday, February 05, 2016
?
This is a terrific chart from @darenw and @StatCast. I'd make each line proportionate to the frequency, so it really stands out in terms of Cabrera's swing plane matching his production level. That is, you'd like to see that where he gets the best production is also when he swings it the most.
Anyway, so Daren shows both batting average and HR, which is one of the very few times I like batting average. Batting average directly corresponds to successful contact, and HR directly corresponds to "perfect" contact.
Missing in there are doubles+triples, which is where wOBA comes in. However, given that both BA and HR are key pieces of data that you must show, the question is if we want to introduce a third piece of data, and whether that third piece should be wOBA or SLG. Let me make the case for SLG (even though my preference is wOBA).
Batting average has a "1" for 1B, 2B, 3B, HR. SLG has values of 1, 2, 3, 4, respectively. If you take 60% of batting average and 40% of SLG, you get this: 1, 1.4, 1.8, 2.2.
wOBA is 0.9, 1.25, 1.6, 2.0. If you multiply all that by 1.11 for scaling purposes, you get: 1.00, 1.39, 1.78, 2.22. What does this mean? In effect, wOBA (on contacted balls) is 60% batting average and 40% SLG. You can therefore introduce SLG, instead of wOBA, and you'll get to convey the information you need.
I should point out however that you CANNOT IGNORE SF, not for the purposes of the chart above. While batting average is an official stat, and so, we can't redefine it, "batting average on contacted balls" is not an official stat. So, we get to control the denominator. You can't throw away sac flies from contacted balls in the chart above. It's part of the frequency.
As well, reaching on error: those should count as well. After all, if you have a high exit velocity on a downward trajectory, that might increase the chance of error. Again, that's directly tied in to success. At some point, and soon, we'll think of "errors" being "bad" for offense as one of the most confusing things we'd have ever considered.
Monday, January 25, 2016
?Suppose you are an outfielder that is positioned very well. You are involved in fairly aggressive shifting. Sharp flyballs or liners get caught at a higher rate on your team than on other teams as a result. Because the balls are hit hard, you likely would not have run too much. Maybe these balls are caught at an average travel distance of 30 feet.
Suppose you are an outfielder that always plays the same spot all the time. Your team doesn't believe in shifting, not even by batting hand. You won't catch most of those liners, but you will catch all those lazy flyballs that you jog 60 feet to catch.
Which outfielder did better? This is the concern when you think of average distance travelled as necessarily a good thing. Now, we don't necessarily live in such extreme conditions, so all that means is that there's layers of nuances to account for. All to say: be careful how you interpret data, like in this article.
These presentations are critical first steps, but they are the beginning. We need a lot of sifting before we can come to an opinion.
Tuesday, January 05, 2016
?Good stuff from Jeff. Having read Jeff for many years, I think Jeff would likely characterize himself as NOT a stathead. Taken that as a given, I would also say that Jeff is an extreme saberist.
Jeff has the first key ingredient, and that is, he is a subject matter expert. What he thinks about, what he uncovers, what he looks for, these are things that only a true baseball fan would even think about. A non-baseball statistic expert wouldn't necessarily think about such things. And a pure stathead might stumble upon it. But a non-stathead baseball fan? Yes, that's the kind of things he thinks about.
Jeff has a second key ingredient, and that's to be able to translate the idea into something that can be organized into various components. And once you have those two things, all it takes is to roll up your sleeves and look for the right data in the right way. Then you get saber-magic. And you get nuggets like this.
And that Jeff is a writer at the quality of Joe Posnanski, that makes Jeff an extremely readable saberist, part of the Bill James family. Obviously, no one rises to Bill's level, but Jeff has all the little things that Bill has, like comparing Ken Griffey Jr to Willie Mays.
(2)
Comments
• 2016/01/06
•
Batted_Ball
Monday, November 30, 2015
?This was one of my favorite threads (start at post 8), notably because of Brian C's involvement a little later on, who does a tremendous job and presentation. I bring up this thread because of Jeff's recent article.
Tuesday, August 11, 2015
UPDATE: see comment 7.
***
?As I was reading this article, I was thinking: "human stringers are going to have a higher correlation than the camera/Doppler with ISO". And, it's true.
***
Interlude/update: I just read the comments, and Rally said the same thing:
When I see stronger correlations with ISO, SLG, etc. to hard hit%, the first question that pops into my head is how much scoring bias is here. If you’re trying to decide whether a borderline hit was hard or medium, I’d guess that the one that falls for a double is more likely to be scored hard hit than the same ball caught by an outfielder.
***
The author reports that the BIS stringer data has an r-squared of .70 (r=.84) between ISO and human-tracked "contact strength". ISO you will note is SLG minus batting average (basically, extra base hits, with extra weight for HR).
The author also reports that the correlation between batted ball speed and ISO is r=.62.
So, does this mean that how a human established "contact strength" is better than how a camera/Doppler/algorithm does it? No. It's basically evidence that a human stringer is more likely to mark a batted ball with higher "contact strength" if it went for extra bases than if it was caught.
Possible biases could be with ground balls. You can have a three-hopper going to the SS with an exit velocity of 95mph. But if it's a routine out to the 1B, what are the chances that the stringer is going to mark that as hard-hit? And similarly, a ball launched at 40-45 degrees at 95mph won't be marked as hard-hit as often as those launched at 15-20 degrees at 95mph.
We have to accept one thing: the exit velocities being reported (at least through Trackman/StatCast) are the gold standard. (Sportvision is the silver standard.) The human tagging of a play as hard-hit or not is inferior. And so, if say a human tags 20% (I'm making up the number) of exit velocities of 70-75mph as "hard hit", and furthermore, among those 20% are gap hits, that's a bias. A human bias.
In order to validate BIS (or any human stringer), we need to see the correlation of the BIS data and the outcomes (1b, 2b, 3b, hr, out) against the Gold (or Silver) standard (exit speed, and launch angle). Then we'll see the bias be apparent. And it'll explain the correlations noted above.
Sunday, August 09, 2015
?Great job by Henry (and Tony in the linked article) in trying to figure out what/how is the missing data.
Wednesday, July 08, 2015
?A cool chart that illustrates the relationship between speed and getting infield hits.
Saturday, May 09, 2015
?Courtesy of our buddy Alan Nathan. He's showing there's a maximum distance by speed, presumably because of launch angle and backspin. Just a very lovely chart.
Tuesday, May 05, 2015
?Terrific piece by Chris.
Chris does what others don't, and that is, look at ALL PA, not just the groundballs. Because as Chris shows, because Moustakas faces a severe shift, it allows him to go the other way and hit liners. Those liners the other way are a result of the shift. It's a cost to the defense of the shift. Don't say some sh!t like "oh, it's a liner, so, it's irrelevant if there was a shift". No, that's not how this works. You gotta look at it holistically.
Page 2 of 3 pages
< 1 2 3 >
Recent comments
Older comments
Page 1 of 152 pages 1 2 3 > Last ›Complete Archive – By Category
Complete Archive – By Date
FORUM TOPICS
Jul 12 15:22 MarcelsApr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref
Apr 12 09:43 What if baseball was like survivor? You are eliminated ...
Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method)
Jul 13 10:20 How to watch great past games without spoilers