The last decade has seen much discussion and evolution in sabermetric thought around the relative abilities of batters, pitchers, fielders, and Lady Luck to control the outcome of batted balls. Data collected by Sportvision and MLBAM sheds new light on this question, but before we tackle that data, let’s review some of the history of how we came to our current state of knowledge.
When Voros McCracken published his Defense-Independent Pitching Statistics in 2001, his findings were considered extremely controversial. Since that time, however, the sabermetric community has largely adopted his conclusions, with some refinements and caveats.
McCracken refined his approach a year later and summarized his conclusions as follows:
1. The amount that MLB pitchers differ with regards to allowing hits on balls in the field of play is much less than had been previously assumed. Good pitchers are good pitchers due to their ability to prevent walks and homers and get strikeouts in some sort of combination of those three.
2. The differences that do exist between pitchers in this regard are small enough so that if you completely ignore them, you still get a very good picture of the pitcher’s overall abilities to prevent runs and contribute to winning baseball games.
3. That said, the small differences do appear to be statistically significant if generally not very relevant.
The following year, Tom Tippett published an extensive study that modified some of McCracken’s conclusions. Tippett’s summary of his work mostly reflects the current state of knowledge on the topic:
1. Pitchers have more influence over in-play hit rates than McCracken suggested. In fact, some pitchers (like Charlie Hough and Jamie Moyer) owe much of their careers to the ability to excel in this respect.
2. Their influence over in-play hit rates is weaker than their influence over walk and strikeout rates. The most successful pitchers in history have saved only a few hits per season on balls in play, when compared with the league or team average. That seems less impressive than it really is, because the league average is such a high standard. Compared to a replacement-level pitcher, the savings are much greater.
3. The low correlation coefficients for in-play batting average suggest that there's a lot more room for random variation in these outcomes than in the defense-independent outcomes. I believe this follows quite naturally from the physics of the game. When a round bat meets a round ball at upwards of 90 miles per hour, and when that ball has laces and some sort of spin, miniscule differences in the nature of that impact can make the difference between a hit and an out. In other words, there's quite a bit of luck involved.
4. Year-to-year variations in IPAvg-versus-team can occur if the quality of a pitcher's teammates varies from year to year, even if that pitcher's performance is fairly consistent.
5. The fact that there's room for random variation doesn't necessarily mean a pitcher doesn't have any influence over the outcomes. It just means that his year-to-year performances can vary randomly around value other than zero, a value that reflects his skills.
6. Unusually good or bad in-play hit rates aren't likely to be repeated the next year. This has significant implications for projections of future performance.
7. Even if a pitcher has less influence on in-play averages than on walks and strikeouts, that doesn't necessarily mean that in-play outcomes are less important. Nearly three quarters of all plate appearances result in a ball being put in play. Because these plays are much more frequent, small differences in these in-play hit rates can have a bigger impact on scoring than larger differences in walk and strikeout rates.
In 2005, John Burnson found that pitchers did not have much impact on their rate of home runs allowed other than the extent to which they allowed outfield flies in general. (Dave Studeman created the xFIP statistic based upon this concept, normalizing not only a pitcher's BABIP rate but also his rate of home runs allowed per outfield fly ball.)
In 2005 and 2006, respectively, J.C. Bradbury and David Gassko found that pitchers had no consistency from year to year in their rate of line drives allowed. They confirmed the finding that pitchers had little year-to-year consistency in the rate of home runs allowed on outfield flies, and they also observed some statistically-significant year-to-year correlation in pitchers’ popup rates.
Having done this research, it becomes obvious why Voros’ original postulate works so well. While pitchers exhibit great control over the types of balls in play they allow, they show little overall control on the two batted ball types that impact BABIP the most—infield flies (where there is some year-to-year correlation) and line drives (where there is none). More so, as infield flies occur relatively rarely (constituting only slightly more than 4% of all balls in play), they will not have enough of an overall impact for any strong year-to-year relationship in year-to-year BABIP. You can make sense of a pitcher’s season just by looking at his home run, strikeout, and walk rates. But you’ll get a better and more detailed picture by using batted ball data.
At this point the devolution of the pitcher’s control over batted balls in sabermetric understanding was basically complete. What mattered on balls in play was whether a pitcher allowed ground balls or fly balls; the rest of his batted-ball performance was unpredictable from year to year. Many analysts thus concluded that strikeouts, walks, and ground ball rate (and perhaps popup rate) were all that mattered for major-league pitchers. In this view, batted ball results beyond getting ground balls (and popups) were due either to the performance of the batter, the pitcher’s fielders and park, or to unrepeatable luck.
Other analysts, including this author, believed that the nature of the physics of the game indicated that, though the current statistics did not show it, the pitcher must have significant control not just over the vertical angle at which the ball came off the bat but also over whether the batter’s contact itself was weak or solid. In fact, a conversation to that effect with Tom Tippett at the 2008 Sportvision PITCHf/x Summit has stayed in my mind ever since. I hope that this study will illuminate the question of whether major-league pitchers have a varied and persistent skill in eliciting weak contact.
At that same 2008 PITCHf/x Summit, Peter Jensen presented a proposal for measuring the initial speed of batted balls using the PITCHf/x camera footage. Over the following off-season, Sportvision developed the HITf/x system to do just that, and the following summer, Sportvision released the HITf/x data from April 2009 for public study.
Earlier this year, I examined the April 2009 HITf/x data to learn whether pitchers had a persistent skill around quality of contact. I found that batters seemed to have a greater degree of control over how hard the ball was hit but that pitchers also had a significant degree of control over batted ball speed. However, the one-month sample size restricted the ability to draw firmer quantitative conclusions, and I did not publish my findings at that time.
This summer, Sportvision graciously provided me with the full season of 2008 HITf/x data, allowing me to study the question on a larger sample of just over 124,000 batted balls.
The HITf/x data measures the speed and direction of each batted ball throughout its trajectory in the PITCHf/x camera frames, which cover roughly the area between home plate and the pitcher’s mound. The reported speed is the average speed over this distance, which will be slightly lower than the initial speed off the bat due to the drag force. In addition, the speeds of ground balls that bounce very near home plate may be difficult to measure prior to the first bounce. Nonetheless, I believe that the initial speeds reported in the data are accurate and consistent enough for this type of evaluation.
To measure the quality of contact, I calculated the initial speed of batted balls in the plane of the playing field. Popups or balls pounded sharply into the ground may leave the bat at a high speed, but they are not usually difficult to field. Balls that travel quickly toward the outfield fence provide a much greater challenge to the fielders.
How does the horizontal component of the speed of the ball off the bat relate to the chances that a ball will fall for a hit?
A batted ball with a horizontal speed off the bat (hSOB) of less than 60 mph had only about a 10 percent chance of turning into a hit. These batted balls were typically infield popups or weak ground balls. At horizontal speeds above 50 or 60 mph, the harder the ball was hit, the better the chance the batter reached safely. When the hSOB was 100 mph or more, the chance of getting a hit exceeded 60 percent.
We will revisit later how quality of contact and other factors affect batting average on balls in play, but let’s return to the question of who controls the quality of contact.
I randomly split the batted balls from the 2008 HITf/x data into two halves and compared the average hSOB between halves for each pitcher and batter with at least 300 total batted balls.
Batters have a good deal of correlation between halves of the sample, with a correlation coefficient of r=0.76 with an average of 201 batted balls in each half. That means that we would add 63 batted balls (or about one month’s worth) at league average to the observed average speed for each batter in order to estimate his true skill.
Here are the batters (excluding pitchers) with the highest and lowest average hSOB in 2008, after applying the regression toward the league average:
Batted Balls |
Observed hSOB (mph) |
Regressed hSOB (mph) |
|
418 |
79.5 |
78.4 |
|
410 |
79.3 |
78.2 |
|
450 |
78.3 |
77.4 |
|
352 |
77.9 |
76.9 |
|
441 |
77.5 |
76.7 |
|
220 |
78.3 |
76.6 |
|
485 |
77.3 |
76.6 |
|
450 |
77.3 |
76.5 |
|
407 |
77.1 |
76.2 |
|
126 |
78.9 |
76.2 |
|
League Average |
|
70.9 |
|
183 |
63.0 |
65.0 |
|
190 |
63.0 |
65.0 |
|
155 |
62.6 |
65.0 |
|
222 |
63.1 |
64.9 |
|
261 |
63.2 |
64.7 |
|
68 |
58.4 |
64.4 |
|
360 |
60.5 |
62.1 |
|
400 |
60.6 |
62.0 |
|
328 |
60.0 |
61.5 |
|
212 |
54.9 |
58.5 |
Here is the same data for pitchers who allowed at least 300 batted balls in 2008.
Pitchers have fairly good correlation between halves of the sample, though not as good as batters. The correlation coefficient is r=0.48 with an average of 251 batted balls in each half. That means that we would add 269 batted balls (or about three months’ worth for a starter) at league average to the observed average speed for each pitcher in order to estimate his true skill.
One thing that stands out is that the spread of values among pitchers is not as big as the spread among batters. For players with at least 300 batted balls, the standard deviation in average hSOB for batters was 3.2 mph, and for pitchers it was 1.8 mph.
Here are the pitchers with the lowest and highest average hSOB allowed in 2008, after applying the regression toward the league average:
Batted Balls |
Observed hSOB (mph) |
Regressed hSOB (mph) |
|
155 |
60.6 |
67.1 |
|
175 |
63.7 |
68.1 |
|
570 |
67.2 |
68.4 |
|
125 |
63.2 |
68.4 |
|
186 |
64.9 |
68.5 |
|
130 |
63.6 |
68.5 |
|
414 |
67.0 |
68.6 |
|
123 |
63.5 |
68.6 |
|
C.C. Sabathia |
624 |
67.8 |
68.7 |
252 |
66.4 |
68.7 |
|
League Average |
|
70.9 |
|
206 |
75.9 |
73.1 |
|
257 |
75.3 |
73.1 |
|
570 |
74.1 |
73.1 |
|
385 |
74.6 |
73.1 |
|
530 |
74.4 |
73.2 |
|
380 |
74.9 |
73.2 |
|
481 |
74.5 |
73.2 |
|
528 |
74.7 |
73.4 |
|
548 |
75.0 |
73.6 |
|
662 |
75.1 |
73.9 |
To look further into the question of who controls the speed of the ball off the bat, I performed a multivariate regression comparing the hSOB of each of the 102,000 batted balls in the sample to the regressed average hSOB for the batter and pitcher involved, where the batter and pitcher each had at least 100 batted balls. The best prediction for the horizontal speed of the ball off the bat comes from weighting the pitcher’s regressed average hSOB by 1.83 and the batter’s regressed average hSOB by 1.20.
However, the spread (standard deviation) of the batters’ regressed average hSOB of 2.76 mph is wider than the spread of the pitchers’ regressed average hSOB of 1.08 mph. Thus, we can estimate that the batter’s average hSOB has about (2.76*1.20) / (1.83*1.08) = 1.7 times as much influence on the resulting hSOB of the batted ball as does the pitcher’s average hSOB.
To put it another way, the pitcher’s average quality of contact is more predictive of the quality of contact on a given batted ball than is the batter’s average quality of contact. However, the average quality of contact varies much less among pitchers than it does among batters in major-league baseball. As a result, the identity of the batter is more important in determining the resulting quality of contact than the identity of the pitcher, at least to the extent that we can determine it with these statistical techniques.
I also performed a similar regression comparing the hSOB of the 40,000 batted balls in the sample to the observed average hSOB for the batter and pitcher involved where the batter and pitcher each had at least 300 batted balls. The results are similar. For that sample, the best prediction for the horizontal speed of the ball off the bat comes from weighting the pitcher’s regressed average hSOB by 1.04 and the batter’s regressed average hSOB by 0.99. The spread of the batters’ observed average hSOB of 3.16 mph is wider than the spread of the pitchers’ regressed average hSOB of 1.77 mph. Thus, we can estimate that the batter’s average hSOB has about (3.16*0.99) / (1.04*1.77) = 1.7 times as much influence on the resulting hSOB of the batted ball as does the pitcher’s average hSOB.
I tried the same regression using pitcher strikeout rate per plate appearance as an additional independent variable, but it had virtually no additional explanatory power in the model (p-value of 0.47).
It is probably possible to build a more sophisticated model to predict batted ball speed based upon batter and pitcher characteristics. However, this simple model suggests that the batter has about twice as much influence on the quality of contact as does the pitcher. A major-league pitcher does not only control whether he gets ground balls or fly balls; he also has a significant degree of control over how hard the ball is hit, though the batter has somewhat more control over the quality of contact than the pitcher. I consider this an extremely significant finding.
Given what we know about DIPS and the unreliability of pitcher BABIP, this conclusion may surprise some. However, let me quickly clarify two points.
First, I have not excluded home runs from the analysis to this point. Removing home runs was a construct, and an illuminating one, that McCracken chose to make DIPS work. However, if we wish to discuss quality of contact, it would be arbitrary and incorrect to remove many of the hardest-hit balls from the sample. We have access to data that was not available a decade ago; thus, we can look at the quality of contact more directly. This analysis is independent of the fielders by virtue of looking at the batted ball speed rather than by segregating by batted ball outcome.
Second, batter and pitcher split-half hSOB correlations are basically unchanged if home runs are excluded from the analysis.
It is possible to conduct a similar analysis with an eye toward better understanding BABIP. The causes of batted ball results are complex and interdependent, but in the second part of this study, I will sketch out some preliminary findings on that topic.
Thanks to Sportvision and MLBAM for providing the HITf/x data for the study. Thanks to Colin Wyers for his input and feedback. Thanks also to Brian Mills and Dave Studeman for their assistance.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
Btw doesnt your IP minimum skew results? Only "successful" pitchers will have the chance to pitch that many innings. Those who allow harder hit balls might be weeded out (though maybe that's an accurate take on MLB).
In terms of whether those who hit weaker balls (as batters) or allow harder-hit balls (as pitchers), I wouldn't be surprised if they get weeded out earlier, perhaps very quickly for the many fringe-MLB quality players who only get a brief chance to establish themselves. Tom Tippett's study of BABIP indicated as much. There is, of course, a selective sampling issue in that future playing time is allocated partly based upon the past outcomes for that player as opposed to their actual skill (we learn their skill partly from their outcomes).
I'd probably need multiple seasons of HITf/x data, or minor league HITf/x data, before I could tease out that effect better than was done in Tippett's study, for example.
And Kudos to Sportvision for supplying the data. I understand they have a business to run, so I completely understand why Hit f/x is not publicly released. Hopefully a model like this could work where they provide back-data from previous years to the public.
Better to get to the party late than never get there at all. Even if Andrew Friedman already ate all the chips.
Really cool stuff. Thinking a bit more from our contact before, I think the next step here would be to look at within-pitcher variation in hSOB as well. Across pitcher variability gives more of an idea of the spread in talent, while within pitcher variation (with significant regression to the mean) might give more insight to ability to control the outcome. I think that you do address this with the split-half correlations, but I'd love to see Observed and Regressed standard deviations for the players in the tables as well to get an idea of the spread as you do at the aggregate level.
Really awesome stuff.
The only thing I could think was that there's practically/physically an upper limit on hSOB around 100-110 mph that is closer to the mean than is the lower limit at 0. Also, the distributions are typically not normal (peak above average with a long lower tail), so I don't know how well standard deviation describes the distribution in that case.
For example, take a fly ball that is in the air for four seconds before it is caught at the 375-ft sign against the left-center field wall. Ignoring the effect of drag that would have slowed the ball slightly, it traveled 375 feet horizontally in four seconds, for a speed of 375/4 = 93.75 ft/sec (equivalent to 63.9 mph).
HITf/x doesn't measure the whole flight of the ball, just the initial portion, but the idea is the same.
Take another example, a popup that is skied over the infield and caught half way down the third base line after seven seconds in the air. The popup may have come off the bat going really fast, maybe 70-80 mph, but most of that speed was vertical. The horizontal component of the speed was much less. Again ignoring air resistance effects, the ball went only 45 feet horizontally in 7 seconds, and 45/7 = 6.4 ft/sec = 4.4 mph.
The horizontal component of the speed tells you more about how solidly the ball was hit than the total speed (including the vertical component). It also tells you more about how difficult the ball was to field because it tells you how quickly the ball got to or past the fielder (how long they had to react, as you said.)
Also, the next question that would occur to me is whether the batter/pitcher interaction has an impact on Batting Average. That is, batting average may not follow the line graph published above on a batter/pitcher specific basis. I assume that's something you'll touch on in the next piece?
Correlation between pitch speed and hSOB is not strong, at least not at major league game velocities. Pitch types and locations make a bigger difference than pitch speed itself. That's not to say that fastball speed has no effect, but it's a lesser effect, and it's not trivial to disentangle from pitch movement and from selective sampling effects (i.e., pitchers that throw slower are in MLB because they are above average at other things).
I'm not planning to directly address your last question in the next piece. It's something I've previously investigated from the April 2009 HITf/x data, but I don't intend to publish the results from the batter-pitcher model I developed from that.
However, it also seems to be true that the faster the pitch comes in, the harder it is for the batter to square up the bat on the ball.
These two effects seem to roughly cancel out in the MLB population of batters and pitchers, though the latter effect may be somewhat more important.
I'll just say that it's not inconsistent with what you can find about the effect of pitch types and location on BABIP from the public PITCHf/x data.
Is it fair to say that, the slower a pitcher throws, the more control they have over "how hard the ball is hit" ala Moyer/Hough/knuckleballers? Can that argument be extended to the importance of having a quality changeup or offspeed pitch?
The pitcher and batter BOTH control quality of contact. The batter has a little bit more control over that than the pitcher, but the pitcher has a lot more control than people have thought since the acceptance of DIPS.
The pitcher presumably controls the quality of contact by deceiving the batter as to where and when he should swing.
Mo Rivera is one of the best, probably THE best, in baseball at this, and he throws hard. But he locates extremely well, and this makes it difficult for the batter to make solid contact with the ball.
That would be a way you could have slower batted ball speeds with a hard throwing pitcher.
Deception (which pitch speed influences) would also cause the same effect. The longer a batter needs to see a pitch to id it, the less time he has to gear up his swing.
So glad you were finally able to do this analysis.
Awesome work, as usual.
Matt
Bunts make up about 2 percent of batted balls in MLB, and large portion of those are by pitchers, but for a few batters it's much more significant.
Taveras, for instance, had 12 percent of his batted balls as bunts in 2008, and Bonifacio 11 percent.
It would be wonderful if BP had a page giving an introduction to advanced metrics. My dad loves baseball and is not shy of numbers, but he didn't end up using the gift subscription I gave to him very often because the advanced metrics daunted him. I tried to send him introductory-type articles when they appeared, but really most articles on the site assume a significant level of familiarity with advanced metrics. That's okay and necessary -- you can't explain the genesis of BABIP every time you want to talk about it, and there are the glossary definitions -- but I would bet that my dad's not the only newcomer to the site who felt like he'd just never be able to "get it" and gave up.
I would also bet that for many other readers who do stick with the site, even among those of us who visit the site regularly, advanced metrics still feel like a second language, which we speak with varying degrees of facility and in which we still don't feel completely fluent. A section of the site devoted to a general introduction to advanced metrics would be great for those of us who never systematically studied advanced metrics (and would like not to have to leave this site to do so -- call me lazy!).
Thank you again for a terrific article. It's exciting new stuff, and a great encapsulation of the foundational concepts of DIPS.
It's always helpful for me to review the past literature on a topic when I am studying it, and I also think I owe the reader and the previous researchers a mention of the work that I am building on.
We also don't know much about whether it is helpful for a pitcher to vary the speed on his fastballs by 2-3 mph. That turns out to be a very difficult question to study properly because speed changes are related to differences between four-seam and two-seam fastballs (and in some cases pitchers use cutters as fastballs, too). Those pitch types tend to be used in different locations, different ball-strike counts, etc., which complicates the analysis.
I agree, though, that the kind of thing you mention is where we are headed with this, though I don't know that a linear regression is the best tool for the job. I prefer to develop a physical model for what is happening, if I can. Linear regression can play a supporting role in that process, but ultimately, we want to know why and how the batter and pitcher do what they do.