Batted_Ball
Batted Ball
Tuesday, December 19, 2017
?In the second of three interesting articles, the author talks about the "carry" of the ball (over two separate articles). And he confirms findings I have found, and others have found years ago, that essentially 1 extra foot of carry results in 3% more HR. He also makes a good point regarding the fences being "closer": since the fences are not circular, a hitter will more likely hit a HR if he hits the ball closer to the line than if he hits it straightaway. While this is obvious, I like the description of saying that the hitter himself can "move" the fences closer. You can also think of "carry" based on the spin rpm and axis. You can think of shooting pool, where if you hit the ball too off-centered, then no matter how hard your "exit velocity" of the cue stick is, you won't get any speed along that attack angle.
Sunday, June 18, 2017
?One of the new waves of saberists out there, Andrew, does some excellent work. One of the things he's doing is taking "estimated wOBA" and applying it in place of actual outcomes. And then using that to forecast future RA/9. Except. Well, except that the best predictor, even better than FIP, is simply K minus BB per PA. That's right. Completely ignore batted balls, whether hard hit or not, by angle or not. Ignore basestealing, ignore everything. Except K and BB.
His main issue is that he basically used a chart like this:
?
And simply added up those values for every batted ball. Prima facie, this is entirely reasonable. You have a high pop up? Let's count that as close to a 0 wOBA (i.e. out). You have a 28 degree 115mph shot? Let's count that as close to a 2.000 wOBA (i.e., HR). And I would say 99% of researchers would do exactly that.
But, what if I tell you that the ENTIRETY of a player's batted ball profile can be determined by the frequency of his barrels? That is, rather than assign a value to every batted ball, let's only assign one value, the same value, to simply those 6% of the balls that falls into the "barrels" category?
You'd think I'm crazy, right? Look how strong that relationship is, looking at barrels to wOBA the following year (not wOBA on batted balls, but overall wOBA including BB and K!)
Well, Andrew just demonstrated that it's better to discard 100% of the batted balls, than to include all of them. (Voros is smiling.) And I'm saying, let's at least START by discarding 94% of the batted balls, and focus on the 6% hit at the ideal speed+angle.
Then, you can start adding a bit more. You can add the near-barrels, those well-struck balls that just missing being barrels. You can add the flares and the burners. And so on. Once you do this, then you'll be in a much better position to forecast the future.
(1)
Comments
• 2017/06/18
•
Batted_Ball
Monday, May 09, 2016
?Pirates are reeling them in.
Just to make sure this isn’t one of those instances where a team is all talk in the spring and no walk in the regular season, Mike Petriello’s got the data, and nobody’s playing a more shallow center field relative to last year than McCutchen. In 2015, McCutchen lined up 316 feet from home plate, on average. This year? He’s averaged a touch under 300 feet (data current as of May 2).
Saturday, May 07, 2016
?Everyone's doing it. And just now, Ben over at BIS through Bill's site posted a counter to Pizza's recent article.
However, there's a flaw in these analysis, one that we've been pointing out for years. And one that Bill James also pointed out in a rather lengthy post a few years ago: you cannot limit the analysis to only those times when a ball is put in play. The question is very simple: very very simple: What is the batter's overall production when he is shifted? That's the question. That's the singular simple question. And to answer that you have to (a) know when he's shifted BEFORE seeing the results and (b) all the outcomes, including BB, SO, HR.
You cannot say that because a batter BB, SO, or HR that the shift had no effect in either way. Au contraire. Given that the batter is responding to the stimulus of a new defensive alignment, EVERYTHING is on the table. How he approaches that situation is at the very heart of the question.
Therefore, every single analysis you see... every single one, without exception.... if it doesn't reference a player's overall production, and that means including BB, SO, HR production can be simple discarded as to its conclusion.
***
It also gets worse because some of these trackers are only marking a shift if they think the shift affected the result. That is, a flyball hit to the LF is not being marked as "shift" data, because the reasoning is that regardless of whether there was a shift or not, the result would have been the same. That's silly on its face, and it's silly underneath the surface. I don't know if these trackers are still doing this, but this is wrong.
Monday, May 02, 2016
?For some 15 years, I've been talking about the convergence of scouting and performance analysis. That is what sabermetrics really is. Sabermetrics is not just about "stats". Scouting information actually provides the necessary prior to evaluate the resulting performances. If you had zero history on Felix and Armando, and you were told to watch their perfect games, a scout might be able to distinguish differences that the resulting performances (0 for 27) did not. As I noted back in 2009, "The idea is to create a model that is complex and comprehensive enough as to make both performance and scouting data obsolete." That is, the convergence of scouting and performance analysis is when they both disappear. As much as it's not possible for that to be true, the goal is to work toward that end.
And William provides some excellent data points toward that end. In terms of "extrapolation", I would have used Alan's Trajectory Calculator as a (very strong) prior. It's inconceivable that you could have a 100mph 8-second popup that travelled 100 feet. The hitter would have to have intentionally gone for a huge uppercut. Setting that small issue aside (on frankly data that won't exist anyway, so it doesn't really matter how you model the parts of a system that can't happen), we get into the fun stuff:
When used in conjunction with Steamer, the impact of exit velocity was still significant, both statistically and practically. We can expect a hitter to outperform his Steamer projected wOBA by roughly three points for each mph of previous-season exit velocity. (We found a similar effect when using either ZiPS projections or an average of ZiPS and Steamer.)
Note that since they are using wOBA (which includes BB and SO), what they are forecasting is not just on future batted balls, but on all plate appearances. Which may SEEM wrong, or at least weird, but when you think about it: guys who hit harder have an effect on how a pitcher will pitch to him. You wouldn't just want to forecast his wOBA on Contact based on his exit speed, but his wOBA on ALL his plate appearances. Naturally, you could break it up so you can see wOBA on Contact and wOBA on non-contact, as well as the FREQUENCY to which he contacts his balls in the future. By looking at future wOBA, all that gets rolled into one.
We've known about the importance of hang time ever since Robert Dudek's simple, yet groundbreaking article, way back in the premier issue of Hardball Times. William continues in that tradition, but also takes advantage of known data points. This is what sabermetrics is all about.
Tuesday, April 19, 2016
Pizza gives us the results of his correlation. Note that since Pizza split his data based on time, we end up with a bias, which is most easily seen with the pitcher data. Setting that aside, I'll present his data, with one extra column:
BIP r Regression Amount
10 0.350 19
20 0.527 18
30 0.635 17
40 0.679 19
50 0.732 18
As he showed, the more number of trials you have, the higher the correlation. This is a given (setting aside systematic bias). If you had a billion trials, you'd have r almost exactly 1. That's why reporting correlation without reporting the number of trials is useless. So, good job on Pizza in showing us this progression.
The extra column I added was the Regression Amount, which is simply determined as (1-r)/r * number of trials. We hope and expect that the regression amount is constant, regardless of number of trials. And, lo and behold, it is! You basically simply add 18 batted balls at league average exit velocity, and, voila, you have an estimate of the hitter's true talent level of exit speed.
If you remember, we did something similar back in the early PITCHf/x days for a pitcher's THROWING speed. And if I remember right, the regression amount we added was... 1 pitch. That is, a pitcher's true talent throwing speed is almost instantly known.
Which of course makes sense, since there's no variable between the pitcher and his release that we really have to consider. With a batter, he has to respond to the pitcher, and he may not hit the ball squarely, or on the same plane. So, there's a gap between his base talent level and his results.
The closer what you do is to what you deliver, the less amount of regression you need. In addition, the wider the overall spread in talent in, the less amount of regression you need. When you have one guy averaging an exit speed at 90mph and another at 75mph, that's a wide spread to begin with. It's much quicker to determine that correlation than if everyone exited at 88-90mph.
Friday, February 05, 2016
?
This is a terrific chart from @darenw and @StatCast. I'd make each line proportionate to the frequency, so it really stands out in terms of Cabrera's swing plane matching his production level. That is, you'd like to see that where he gets the best production is also when he swings it the most.
Anyway, so Daren shows both batting average and HR, which is one of the very few times I like batting average. Batting average directly corresponds to successful contact, and HR directly corresponds to "perfect" contact.
Missing in there are doubles+triples, which is where wOBA comes in. However, given that both BA and HR are key pieces of data that you must show, the question is if we want to introduce a third piece of data, and whether that third piece should be wOBA or SLG. Let me make the case for SLG (even though my preference is wOBA).
Batting average has a "1" for 1B, 2B, 3B, HR. SLG has values of 1, 2, 3, 4, respectively. If you take 60% of batting average and 40% of SLG, you get this: 1, 1.4, 1.8, 2.2.
wOBA is 0.9, 1.25, 1.6, 2.0. If you multiply all that by 1.11 for scaling purposes, you get: 1.00, 1.39, 1.78, 2.22. What does this mean? In effect, wOBA (on contacted balls) is 60% batting average and 40% SLG. You can therefore introduce SLG, instead of wOBA, and you'll get to convey the information you need.
I should point out however that you CANNOT IGNORE SF, not for the purposes of the chart above. While batting average is an official stat, and so, we can't redefine it, "batting average on contacted balls" is not an official stat. So, we get to control the denominator. You can't throw away sac flies from contacted balls in the chart above. It's part of the frequency.
As well, reaching on error: those should count as well. After all, if you have a high exit velocity on a downward trajectory, that might increase the chance of error. Again, that's directly tied in to success. At some point, and soon, we'll think of "errors" being "bad" for offense as one of the most confusing things we'd have ever considered.
Monday, January 25, 2016
?Suppose you are an outfielder that is positioned very well. You are involved in fairly aggressive shifting. Sharp flyballs or liners get caught at a higher rate on your team than on other teams as a result. Because the balls are hit hard, you likely would not have run too much. Maybe these balls are caught at an average travel distance of 30 feet.
Suppose you are an outfielder that always plays the same spot all the time. Your team doesn't believe in shifting, not even by batting hand. You won't catch most of those liners, but you will catch all those lazy flyballs that you jog 60 feet to catch.
Which outfielder did better? This is the concern when you think of average distance travelled as necessarily a good thing. Now, we don't necessarily live in such extreme conditions, so all that means is that there's layers of nuances to account for. All to say: be careful how you interpret data, like in this article.
These presentations are critical first steps, but they are the beginning. We need a lot of sifting before we can come to an opinion.
Tuesday, January 05, 2016
?Good stuff from Jeff. Having read Jeff for many years, I think Jeff would likely characterize himself as NOT a stathead. Taken that as a given, I would also say that Jeff is an extreme saberist.
Jeff has the first key ingredient, and that is, he is a subject matter expert. What he thinks about, what he uncovers, what he looks for, these are things that only a true baseball fan would even think about. A non-baseball statistic expert wouldn't necessarily think about such things. And a pure stathead might stumble upon it. But a non-stathead baseball fan? Yes, that's the kind of things he thinks about.
Jeff has a second key ingredient, and that's to be able to translate the idea into something that can be organized into various components. And once you have those two things, all it takes is to roll up your sleeves and look for the right data in the right way. Then you get saber-magic. And you get nuggets like this.
And that Jeff is a writer at the quality of Joe Posnanski, that makes Jeff an extremely readable saberist, part of the Bill James family. Obviously, no one rises to Bill's level, but Jeff has all the little things that Bill has, like comparing Ken Griffey Jr to Willie Mays.
(2)
Comments
• 2016/01/06
•
Batted_Ball
Monday, November 30, 2015
?This was one of my favorite threads (start at post 8), notably because of Brian C's involvement a little later on, who does a tremendous job and presentation. I bring up this thread because of Jeff's recent article.
Tuesday, August 11, 2015
UPDATE: see comment 7.
***
?As I was reading this article, I was thinking: "human stringers are going to have a higher correlation than the camera/Doppler with ISO". And, it's true.
***
Interlude/update: I just read the comments, and Rally said the same thing:
When I see stronger correlations with ISO, SLG, etc. to hard hit%, the first question that pops into my head is how much scoring bias is here. If you’re trying to decide whether a borderline hit was hard or medium, I’d guess that the one that falls for a double is more likely to be scored hard hit than the same ball caught by an outfielder.
***
The author reports that the BIS stringer data has an r-squared of .70 (r=.84) between ISO and human-tracked "contact strength". ISO you will note is SLG minus batting average (basically, extra base hits, with extra weight for HR).
The author also reports that the correlation between batted ball speed and ISO is r=.62.
So, does this mean that how a human established "contact strength" is better than how a camera/Doppler/algorithm does it? No. It's basically evidence that a human stringer is more likely to mark a batted ball with higher "contact strength" if it went for extra bases than if it was caught.
Possible biases could be with ground balls. You can have a three-hopper going to the SS with an exit velocity of 95mph. But if it's a routine out to the 1B, what are the chances that the stringer is going to mark that as hard-hit? And similarly, a ball launched at 40-45 degrees at 95mph won't be marked as hard-hit as often as those launched at 15-20 degrees at 95mph.
We have to accept one thing: the exit velocities being reported (at least through Trackman/StatCast) are the gold standard. (Sportvision is the silver standard.) The human tagging of a play as hard-hit or not is inferior. And so, if say a human tags 20% (I'm making up the number) of exit velocities of 70-75mph as "hard hit", and furthermore, among those 20% are gap hits, that's a bias. A human bias.
In order to validate BIS (or any human stringer), we need to see the correlation of the BIS data and the outcomes (1b, 2b, 3b, hr, out) against the Gold (or Silver) standard (exit speed, and launch angle). Then we'll see the bias be apparent. And it'll explain the correlations noted above.
Sunday, August 09, 2015
?Great job by Henry (and Tony in the linked article) in trying to figure out what/how is the missing data.
Wednesday, July 08, 2015
?A cool chart that illustrates the relationship between speed and getting infield hits.
Saturday, May 09, 2015
?Courtesy of our buddy Alan Nathan. He's showing there's a maximum distance by speed, presumably because of launch angle and backspin. Just a very lovely chart.
Tuesday, May 05, 2015
?Terrific piece by Chris.
Chris does what others don't, and that is, look at ALL PA, not just the groundballs. Because as Chris shows, because Moustakas faces a severe shift, it allows him to go the other way and hit liners. Those liners the other way are a result of the shift. It's a cost to the defense of the shift. Don't say some sh!t like "oh, it's a liner, so, it's irrelevant if there was a shift". No, that's not how this works. You gotta look at it holistically.
Wednesday, March 18, 2015
?Good stuff from Shane, as he follows up on some findings in The Book and looks at things in a more granular manner. He's basically showing that when same-type pitchers/batters face off, you get extreme results, at the expense of line drives. But when opposite-type players face off, they cancel themselves out, resulting in a bit more line drives. It's a nice way to show the effect.
Thursday, June 12, 2014
Lewis makes a nice statement here:
I fear we are becoming far too quick to identify outlier pitchers as exceptions to DIPS norms rather than understanding them to be manifestations of typical population variance.
I don't know who this "we" is, but the main point is interesting in its description. The basic idea is that we don't have true exceptions, or even true outliers. What we do have is simply a distribution of talent. With K/PA, that distribution of talent is quite wide in MLB. With BABIP, that distribution of talent is quite narrow. In either case, it's a distribution, not a bunch of players in one spot, and then a few exceptions on the tails.
This is why we apply Regression Toward The Mean. We have a reasonable idea as to the width of the distribuition of BABIP talent in MLB. Given the observed BABIP and the number of BIP those observations are based on (plus whatever park factors we have and if you can use FB/GB/LD tendencies, all the better), we can make an estimate as to what each pitcher's BABIP talent is.
And when it comes to an extreme case like Cain, we're going to move that needle somewhat toward the population mean, and we move it less, the more data we have. What we end up with therefore is still a distribution, no outliers, no exceptions.?
I don't know that it helps to be this technically correct. But to the extent that we shouldn't be lazy about it, I guess I agree.
Friday, April 04, 2014
David is a bit of a fan.?
Sunday, March 02, 2014
?This article from Max reminded me of this article by MGL ten years ago. It looks like Max is on the right path. I didn't look at all the particulars, so hopefully the Straight Arrow readers will critique it.
Page 2 of 3 pages
< 1 2 3 >
Recent comments
Older comments
Page 2 of 150 pages < 1 2 3 4 > Last ›Complete Archive – By Category
Complete Archive – By Date
FORUM TOPICS
Jul 12 15:22 MarcelsApr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref
Apr 12 09:43 What if baseball was like survivor? You are eliminated ...
Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method)
Jul 13 10:20 How to watch great past games without spoilers