Statcast
StatCast
Thursday, August 11, 2016
Mike Petriello asked me this interest?ing question. I've only given it passing thought in the past, but let's see what we can come up with.
Say a batter averages 90mph on balls that are contacted that ends the at bat, for 400 such occurrences. He averages another 80mph on foul balls that keep the at bat alive, for 200 such occurrences. And he swings and misses another 100 times for, obviously, 0 mph. Generally speaking, only those 400 occurrences get reported.
It's obviously not fair to discard those 100 swings and misses, since a player, whose intent is to swing hard all the time gets a "free pass" any time he swings and misses. A player who displays more bat control may swing and miss less, but at the cost of exit speed once he does make contact.
What to do? Well, we can focus on linear weights and how the run value is impacted. Generally speaking, a swing and miss will cost you around .05 to .10 runs. That is, by swinging and missing you are placed in a more disadvantaged plate count (or worse, you might have struck out).
What happens when you hit a ball under 90 mph? Generally speaking, if you can't meet the launch threshold, your wOBA is around .250, compared to an overall average of .350. And a .100 point difference in wOBA is around .08 runs. That is, a swing-and-miss leads to roughly the same outcome as a low-exit plate appearance.
So, I would think therefore that a swing-and-miss should "count" the same as an under-90mph contact. If the average of such contacts is say 75mph (I'll have to check, but let's go with this for the moment), then I would count the "equivalent" exit speed of a swing-and-miss at 75mph.
That's my first thought on the matter. What do you guys think?
(8)
Comments
• 2017/09/10
•
Statcast
Wednesday, August 10, 2016
We saw a couple of cool charts, that showed the wOBA, based on the exit speed, vertical launch angle, and horizontal spray direction.
And here's the third in the set, with vertical launch angle on the y-axis, speed on the x-axis, with slideshow by horizontal spray direction. As you can see, compared to the above two, the horizontal spray direction isn't as impactful.
***
A few weeks ago, long-time blog-buddy and all-round good guy David Appelman at Fangraphs(*) said something like "what I like about your stuff is that it's simple". Now, it may not sound complementary, but it was all in the delivery. You may recognize a form of that from a popular quote: "everything should be made as simple as possible... but no simpler".
(*) If you have a chance to work for David, you should.
That's basically how I try to beat down a problem, amidst chaotic numbers, find some way to simplify them. This is best encapsulated with Marcel The Monkey Forecasting System, or The Marcels. I spent dozens(*) of hours trying to come up with a great forecasting system, only to always go back to the drawing board with one main takeaway: I don't need to overengineer it.
(*) Hundreds actually, but for my sanity, I have to say dozens.
***
We have the overwhelming StatCast data and we need to come up with a Quality of Contact for the triplet of numbers of exit speed, vertical launch, and horizontal spray. We want to do it in a transparent manner, and have it run very quickly, with 188,180 data points (and climbing) at our disposal.
The first step is to do what many people do, and that is to bin all the data into units of +/- 1. So, a batted ball at 99.2 mph, 18.4 degrees of vertical launch, and -16.1 degrees of spray direction would go in the 100,18,-16 bin. We have 74,520 bins, which is every combination for 40 to 110 mph, -30 to +60 vertical launch angle, and -44 to +44 horizontal spray direction. We bin all our 188,180 data points appropriately.
The most popular bin is this one: 102, +14, +6, meaning 102mph (+/- 1), +14 degrees of launch (+/- 1), +6 degrees of spray (+/- 1). There are 41 batted balls in this bin. Unfortunately, we end up with 33,892 bins with no data at all, or 45% of all the bins. As you can see, very chaotic data, with almost half the bins empty, and the other half averaging only 4 or 5 batted balls. This for example is what it looks like, based purely on binning, at a horizontal spray direction of -36 degrees, with the vertical angle on the y-axis and exit speed on the x-axis:
The black spots are where we have no data at all. As you can see, lots of gaps. But also where we have contiguous data, we can start to see potential patterns.
Now, since we've used up all our data, where can we find more data? The next step was to expand each bin by creating a second circle(*) that was 3 times the size of the first circle. In essence, a rolling average, as each bin "borrows" data from neighboring bins (in addition to still keeping its own data). From that perspective, the most popular bin is this one: 102, +14, +4. It had 31 batted balls within the inner circle, and now 94 balls in this larger circle. Here's what that looks like using the same -36 degrees chart:
(*) I may be saying circle, which is a 2-dimensional object, but I actually mean a spheroid, a 3-D object.
Some of the gaps start getting filled, and we get a bit more smoothing overall. And I keep going, tripling the size of each circle. The most popular bin here has 231 batted balls, in the 102, +16, +6 bin. (This bin had 20 batted balls in the most inner circle, and 72 in the next outer circle.) We're now at this point with the -36 degree chart:
Most gaps are filled, strong patterns are emerging. And on and on we go, always tripling the size of each bin until I got to +/- 9 units of each of the horizontal, vertical, and speed parameters. Then I put in one final one, and that was the "mirror" of each of those bins, so that bin 102, +14, +6 would be matched to 102, +14, -6. (Remember that since 0 degrees of spray is up-the-middle, then +6 degrees is a mirror of -6.)
Now, that lays the groundwork for every bin, with 8 circles for each bin. The next step was to weight each circle, with each outer circle getting progressively less weight. In a few cases, I only needed to weight the innermost circle, while discarding all the outer circles because I had enough data. In other cases, I needed to use 2 circles, or 3 circles, etc. And in those cases where there was little batted ball data to go on because of the odd combination of the triplet, I ended up using all 8 circles.
The end result for this particular angle is this:
As you can see, all gaps are filled, and a fairly strong pattern emerges. We do have splotches here and there. Those can be removed by increasing the smoothing pattern. But, there's a danger of over-smoothing, as some splotches *should* remain. We have 7 fielders on the field, all spread out, so, we have to be careful about over-smoothing. And that's really the tradeoff between over-smoothing and under-smoothing.
Thanks to gifmaker.me, here's a slideshow of the smoothing process:
Now, can this be improved upon? You bet. I can spend dozens(*) more hours refining this process, and make it more sensitive to various triplets. For example, certain triplets are more sensitive than others to its neighbors. Certain triplets have a stronger relationship to its "mirrors" on the other side of the field (think LF/RF gaps). The goal as always is transparency and speed. So, the entire SQL script, including preparing the source data from scratch, runs fairly quickly to generate the chart of 74,520 bins. Once that chart is generated, then we can retrieve results for any combination of triplets instantaneously. As it stands, the results came out fairly good, as you can judge by the earlier slide shows. And I should note that great work has already been coming out, notably over at Fangraphs and Hardball Times.
(*) Even hundreds.
***
As I was working on this project, I had a thought I couldn't unthink. I have a path to create a version 2 for Quality of Contact that will go in a somewhat different direction. But that is months away, but it's fairly exciting what will come of it. It's complex and intricate, but actually very intuitive. It will be much easier to explain. And ultimately, it will be even faster to run, and be more robust. As its a paradigm shift, it just is going to take time for me to code version 2.
Until then, I coded up this version 1 so that we have a baseline reference. So, this first version is Marcel-like, a simple process, but no simpler. And in a few months, I'll roll out the second version. At the same time, I can keep plugging away at version 1, improving it on the periphery. I can make adjustments to version 1 based on feedback here. This brings me back to those Marcel days, as every time I decide to move on to something else, it pulls me back in. We're just getting started.
Monday, August 08, 2016
?
Similar to the earlier chart, but this time the exit speed is on the y-axis, and each slide is 8 degrees of vertical launch angle. Just remember that at 28 degrees it's ideal HR angle, and 12 degrees is ideal line drive angle. Everything else fits around that.
The green box is just a reference point, which is 100 down to 80 mph, at 0 degree of spray.
?
?This is an animated gif. Hopefully this comes out ok. The redder the cell, the higher the wOBA. The bluer the cell, the lower the wOBA. White is average.
- On the x-axis is the horizontal spray angle, from 3B line to 1B line.
- On the y-axis is the vertical launch angle, from -30 to +60 degrees.
- I took a snapshot of various exit speeds, from 40 mph to 110 mph in steps of 10. It starts at 110 down to 40, then back up again to 110, and it continues to loop. Thanks to http://gifmaker.me/
Note: If you want to get your bearings, when the image gets to the 100mph slide, locate the center blue spot, and that's 0 degrees horizontal spray, 18 degrees vertical launch. Just hover your cursor over that spot, and that'll center your viewing. Sorry for not thinking about it before I created the gif.
(9)
Comments
• 2016/08/09
•
Statcast
Friday, July 29, 2016
As we discussed last time, we need to make a distinction between estimating what might have happened, and predicting would could happen.
When we estimate what might have happened, we start with what DID happen, and then we remove one or more variables. A simple example is if a team gets 14 hits and 3 walks in a game, but manages to not score a single run. If you accept that the 14 hits and 3 walks (and 27 outs) happened, but the SEQUENCING of those events can be changed, then we can provide an estimate of what might have happened... if sequencing of events was not fixed. A quick estimate would suggest that this team would score about six to seven runs.
What could we predict would happen in the future knowing that we have a team that got 14 hits and 3 walks? In that case, we would infer that a team that got 14 hits and 3 walks is an above average hitting team. We would NOT predict they'd get exactly 14 hits and exactly 3 walks, but we'd predict they'd get an above average number of hits+walks. And we might predict that such a team would end up scoring about five runs a game.
The difference boils down to: (a) focusing on the things that happened that you care about and ignore the other things that happened that you don't care about, and (b) inferring what could happen based on what did happen.
If you choose (a), you are trying to estimate what might have happened if you could change one or more things. In other words, you are recreating the past.
If you choose (b), you are trying to evaluate the talent and environment so you can come up with an expected value. In other words, you are providing a forecast of events that have yet to unfold.
***
This chart shows the wOBA if all you know is the exit speed of the batted ball.
As you can see, there's alot of up/down at speeds 20-60mph, as you have things like bunts and checked swings and slow rollers that conspire against getting a smooth line. The inflection point happens at close to 90mph, and once you hit 95mph, you are on a straight upward slope, up until 110mph or so. After that, you get into something we learned recently that you maximize your exit speed if you do NOT loft the ball. But that comes at a cost of performance, which is why we have so many 115-120 mph groundballs that lead to outs: the batter hit the ball too square.
So, if you are interested in what might have happened if all you knew was the exit speed of the batted ball, you would simply consult this chart. You are saying that the vertical launch angle and the horizontal spray angle (and by extension the distance and hang time) are variables that you don't want to consider. You just want to know what might have happened if you got a "typical" launch and spray angles.
Let's take the above chart, but zoom in at the 80-120mph range.
Between 80 and 90mph, we estimate a wOBA of .200 to .250. At 100mph, we estimate a wOBA of .600, and at 105mph, an estimate of .900 wOBA. That's our estimate of what might have happened, if all we know is the exit speed.
***
Now, we know it takes extraordinary talent to hit a groundball 120mph. This is something that basically is in the wheelhouse of Giancarlo Stanton. If you have someone who can hit a ball that hard, regardless of whether it's a negative launch angle, or an ideal launch angle, you are dealing with a great hitter. We can infer the talent level of a hitter who can hit a ball that hard. We can forecast the future, and we can provide an expected value for that hitter.
We can forecast by looking at what has happened in one time period, and see how it correlates with another time period. And when we do this, we come up with this chart:
We can see that we get an inflection point of 88mph. Up until then, any batted ball hit under 88mph provides no information as to the quality of our batter. Whether the batter got an 80mph contact or 60mph contact, it has no impact to determining the true talent level of our hitter. A ball hit at 100mph allows us to infer a talent level of .500 wOBA, and at 105 mph, an estimate of just over .600 wOBA. And at 120mph, it's a wOBA of 1.200.
***
While these two charts are similar, we can see a few differences:
1. We don't care about speeds under 88mph
2. The slopes are much different, about half the rise for the (b) chart than the (a) chart
3. Speeds above 115mph still maintain prominence for the (b) chart than the (a) chart
So, before we go on to consider the vertical angle (and eventually the spray angle), we need to appreciate what track of questions we are after.
(10)
Comments
• 2016/08/03
•
Statcast
Friday, July 22, 2016
?Suppose you "win" when you roll a 5 or 6, and you lose rolling 1 through 4. The "expected value" of your dice roll is a win rate of 33.3%. We would therefore forecast you to win 33.3% of the time. The true talent level of the dice roll is 33.3%.
Suppose you "win" when your dice roll is higher than your opponent's by at least 2 points. So, when your opponent rolls 1, you can roll 3 or higher to win. If he rolls a 2, you need to roll 4 or higher. He rolls 3, you need to roll 5 or 6. If he rolls 4, you need to roll 6. That gives you 27.8% chance of winning. That's your expected value, that's your true talent, that's what we would forecast as your chances of winning.
Now, suppose we don't know what your opponent rolled. We just know that you rolled a 5. Since you win if your opponent rolls 1, 2, or 3, rolling a 5 is worth "50%". We therefore estimate that you would have won 50% of the time, if you kept rolling a 5 over and over and over again. What we are doing in this scenario is saying that your opponent's roll IN THAT INSTANT didn't matter. We estimate that KNOWING you rolled a 5, we will therefore estimate its value as 50%.
***
We know the run values of hits, walks, HR and strikeouts. We can estimate what might have happened if we did not know the sequencing of those events. And we would make that kind of estimate because we strongly suspect that sequencing of events is not a skill, but just a matter of circumstance (for the most part). We therefore KNOW the events, but not the sequencing in making our estimate.
But that is still NOT the same thing as forecasting would could happen. And that is because strikeouts are far more predictive of future events than non-HR hits. Even walks are more predictive than non-HR hits. And this is true to the point that the value of a walk is about the same as the value of a non-HR hit, in forecasting future runs. In other words, the "true talent" run value of a walk is the same as the true talent run value of a non-HR hit. That's how we would forecast.
See, what happens is that walks are an indicator of talent, much more than non-HR hits. What we care about, in terms of true talent or forecasting (which is essentially the same thing), is what is innate. And walks are more innate to a player than non-HR hits.
FIP is an example of an estimate of what might have happened.
But if you want to predict what could happen, you would NOT use FIP, not in its current state. What you want is FUTURE FIP.
How do we change the classic FIP:
ERA = (13*HR + 3*BB - 2*SO)/IP + 3.2
Here then is my first stab at…
FutureFIP = (6*HR + 2*BB - 2.5*SO)/IP + 5.12
As you can see, in terms of estimating runs, the HR has twice the impact compared to predicting runs. And whereas estimating the impact to runs is more on the walk than on the K, it's the K that is more predictive than the walk in terms of runs.
***
Batted ball exit speed: there's no question that an exit speed of 95mph is much more preferable than 90mph IN THAT INSTANT: a .363 wOBA v .244 wOBA, as of today, a difference of close to 120 wOBA points. That is, to estimate what might have happened, and not knowing anything else, 95mph has much more value than 90mph. But in terms of forecasting, the value of 95 is not that much higher than 90. As we will learn in the coming weeks, it's about half that much, about 60 points of wOBA. That is because we are trying to infer what does 95mph tell is about a hitter compared to 90mph.
So, we have to be careful what we are trying to do, when using data, if it's trying to predict the future, which is the same thing as establishing the true talent level, which is the same thing as establishing the expected value.
Or if it's trying to estimate what happened, which means taking as a point of fact the event that happened, and assuming SOME circumstances as having happened, and other circumstances as not having existed.
These two things have some relationship, but they are really two different answers to two different questions. And both questions are valid in their own way.
Remember this, as we get more into this in the coming weeks.
Wednesday, July 20, 2016
?This image below packs a ton, so bear with me a little as you click it and we can talk about it.
Read More
Tuesday, July 19, 2016
?After the HR Derby, I tweeted this out, which revealed something that is obvious when you think about it, but maybe isn't appreciated enough. In order to hit the ball the hardest, you want to get "all" of the ball. But in order to hit the ball the farthest, you need to have a bit of an offset, trading speed for loft.
?
Alan has a terrific article on the subject, with this descriptive chart, where he shows you need a one inch offset to maximize distance, with a cost of a few mph of exit speed.
A word first on optimal launch angles. For a hitter, he maximizes hits at 12 degrees, but he maximizes home runs at 28 degrees. At 20 degrees, it's a "worst of the best of both worlds": you still get great results at 20 degrees, just not as good as at 12, nor as at 28. Indeed, one can even reason that a batter's ideal launch angle is 20 degrees AS A MEDIAN, knowing that if he "mis-angles" a bit low or a bit high, he'll get better results.
So, what I like to do is create launch angle bands in groups of 8 degrees. We have the two ideal launch angles of 8-16 (median of 12) and 24-32 (median of 28). Then we fill in the rest: under 0, 0-8, 16-24, 32-40, 40+.
Repeating the methodology of the study I did last night, I now focus on the vertical launch angle, to see if there's any difference. And we do see a difference! Remember, I am looking at the same hitter-pitcher pairs year to year. And the hitters are getting fewer groundballs, a bit fewer balls in the traditional liner angle (8-16 degrees), and a bit more balls where we'd normally see HR (24-32 degrees).
Now, this could happen either because they are "mis-angling low", meaning they are keeping the same attack angle, and just getting under the ball more. Of course if you do that, your exit speed will ALSO decrease. Which we don't find. Or, the hitters are changing the way they are batting, just slightly enough, that they are launching the balls higher with a matching attack angle. And maybe improving their approach slightly so they get a bit more of the ball (1 mph of higher exit speed), and at an angle that leads to more HR.
UPDATE:
Doing the same "shifting" exercise, the numbers are fairly close if we compare counts this year to counts last year when off by 1 degree:
?
(29)
Comments
• 2016/08/10
•
Statcast
Monday, July 18, 2016
?Alan Nathan has an excellent article where he reasons:
Also shown in the figure and table are the results of shifting the 2015 data by +1.5 mph, producing a curve that overlaps essentially perfectly with the 2016 curve and results in the approximately the same number of home runs.
Alan looked at all batted ball events without controlling for the identity of the batter or pitcher. It's simply the sheer quantity of batted ball events. If the talent changes year to year, which it obviously does, then this will have some effect. How much? Well, that's why we study these things.
Let me try something a bit different. What I will do is look at each batter-pitcher combination through June 30, 2015 and compare that to their performance in 2016 through June 30. For example Greinke faced CarGo 7 times in 2015 that resulted in a ball in play and 7 times in 2016 that resulted in a ball in play.
In 2015, two of those went for over 100 mph, two went for 90-100, and 3 went for under 80. (Note to kids: don't average these numbers. You'll hurt yourself.)
In 2016, two of those went for over 100 mph, two went for 90-100, and 3 went for under 80. Exactly the same thing.
I repeated this for all pairs of players. There was 4860 pairs. In the cases where the number of PA was higher in one year or the other, I prorated down to the lower number of PA. This way we have the same 4860 pairs of players in both groups, and each of them is of the same size year to year. (I didn't control for park, but that can always be added in.)
We end up with 7656 PA. In 2015, 1605 of them resulted in a batted ball over 100mph. But in 2016, it was 1858, or 16% more. While not as large a number as Alan's, it is still quite significant. On the flip side, in 2015, 1872 resulted in a batted ball under 80mph, while in 2016 it was 1746.
Now, let's do like Alan did and "shift" the numbers. Rather than comparing number of batted balls over 100mph in each year, let's count those over 100mph in 2016 and those over 99mph in 2015. In effect, we're saying that batted balls are being hit 1mph faster in 2016 than 2015. What do we get?
An almost perfect match! So, it seems that balls are being hit harder. And we know that we're getting more HR in 2016 than 2015. All evidence shows balls are being hit harder. But we don't know the reason. While the ball is the easy culprit to point to, it could also be other things, like the strike zone, or the approach of hitters, or a combination of factors. For example, if hitters are deciding to be a bit more patient, they'll pay for it with more strikeouts, they'll get more walks, and they're waiting for better pitches to hit, which they can then hit harder. And, we do have more K, and we do have more BB and we do have more harder hit balls and we do have more HR. We don't have to rush for a reason. We can just let the data speak as best it can.
(8)
Comments
• 2016/08/21
•
Statcast
Friday, July 15, 2016
While we're making our strides into better encapsulating fielding, I'll be giving you a few peeks under the hood. You've seen @darenw Tweet some cool stuff, so I'll be doing the same. It's all a work in progress, so things will be changing as we learn new things, as we just follow wherever the data will take us.
Adam Eaton has become the go-to guy for Daren and I to look at fielding.
?
On the left, you will see where MLB RF position themselves. That top image is the angle relative to home plate. Zero is up the middle (home to 2B bag to the wall). +45 degrees is the RF foul line. We can see therefore that RF position themselves at around 25-28 degrees. The bottom image is distance to home plate, and we can see MLB RF are around 285 to 300 feet from home plate. Not shown here is the split between v LHH and v RHH. I'll talk about that in the future, as there's something interesting there, in addition to the CF.
Adam Eaton matches the mode of the league, at 26-27 degrees, and rarely ventures outside that zone. How does that compare to other outfielders? I don't know yet! But doing this for all RF is one of the next items on the agenda, not to mention seeing differences by park. Distance to home plate shows that Eaton definitely plays a bit shallower, at 275-290 feet, about 10 feet shallower than the league RF. Given the results that Daren tweeted the other day (see image below), those ten feet might be what he needs. Still too early for conclusions, but we'll get there.
(10)
Comments
• 2017/01/12
•
Fielding
•
Statcast
Sunday, July 10, 2016
?We saw some cool stuff from Greg several years back. I gave some background as well. And even before all that, Rick Swanson wrote about it as well. Now we're seeing it with actual data from Daren.
(10)
Comments
• 2016/07/20
•
Fielding
•
Statcast
Recent comments
Older comments
Page 7 of 150 pages ‹ First < 5 6 7 8 9 > Last ›Complete Archive – By Category
Complete Archive – By Date
FORUM TOPICS
Jul 12 15:22 MarcelsApr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref
Apr 12 09:43 What if baseball was like survivor? You are eliminated ...
Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method)
Jul 13 10:20 How to watch great past games without spoilers