[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
THE BOOK cover
The Unwritten Book
is Finally Written!

Read Excerpts & Reviews
E-Book available
as Amazon Kindle or
at iTunes for $9.99.

Hardcopy available at Amazon
SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
Shop Amazon & Support This Blog
RECENT FORUM TOPICS
Jul 12 15:22 Marcels
Apr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref

Advanced

Tangotiger Blog

A blog about baseball, hockey, life, and whatever else there is.

Sunday, March 03, 2024

NaiveWAR and VictoryShares

In my spare time, I'm working on an open-source WAR, that I call NaiveWAR.  Those of you who have been following me know some of the background on NaiveWAR, notably that it is tied (indirectly to start with) to Win/Losses of teams (aka The Individualized Won/Loss Records).  My biggest failing in developing the WAR framework was not also providing the mechanism for W/L at the same time.  That will be rectified.

The most important part of all this is that it's all based on Retrosheet data, and everyone would be able to recreate what I do.  And it would be totally transparent, with plenty of step by step discussion, so everyone can follow along.  I was also thinking of potentially using this as a way to teach coders SQL.  That's way out in the distance, still have to work things out, but just something I've been thinking about as I'm coding this.  I even have the perfect name for this course, which I'll divulge if/when this comes to fruition.

Interestingly, RallyMonkey, who is the progenitor of the WAR you see on Baseball Reference seems to be embarking on a somewhat similar campaign. You can see alot of the overlap, with tying things to W/L records, with the emphasis on Retrosheet.  The important part of doing that is we'd be able to do it EACH way, with/without tying it to W/L, so you can see the impact, at the seasonal, and career, level. In some respects, he'll go further than I will with regards to fielding, mostly because I have so little interest in trying to make sense of that historical data, given the level of access Statcast provides me.  But also partly because by me not doing it, it opens the doors for the Aspiring Saberists to make their mark, that somewhere between my presentation and Rally's presentation, they'll find that inspiration.

All to say: I dunno what I'm trying to say!

(35) Comments • 2024/10/14 • WAR

Saturday, February 24, 2024

Complete Historical Run Expectancy Chart

This shows the following, for the entirety of the Retrosheet data, broken up into roughly 15-20 year time periods

  • Run Expectancy
  • Run Frequency
  • State Frequency
  • Run Value of HR

(Click to embiggen)

(18) Comments • 2024/03/04 • Run_Win_Expectancy

Thursday, February 22, 2024

When is Replacement Level not the Replacement Level

The concept of Replacement Level (though I prefer the term Readily Available Talent, which you will see makes more sense) is pretty straightforward.  What kind of contribution can you get for the minimal cost?  If you have no farm system at all, that level is roughly a .300 win% level.  That's the Readily Available Talent.  By spending the absolute minimum on the free agent market, you will field a .300 team.  At least theoretically.  

In reality, all clubs have a minor league system.  And they spend millions of dollars on players and player development and player acquisition.  Because those players are now Readily Available Talent for no ADDITIONAL cost (the money spent is already sunk), suddenly, the baseline level player is not a .300 win% talent, but probably closer to .350 win% talent.  While this player would cost you just the league minimum, it did cost you in terms of your minor league setup.  

This is why it gets tricky when you try to decide what the baseline level is.  Furthermore, if you decide to field an entire team only from the minor leagues, well, not all the players will be .350 win% talent.  After your very top prospects, you will start to go below that .350 win% level quite quickly.  So a team of your best 40 minor league players is likely going to win you fewer than .300 win%, probably even down to .250 or .200 win%.

Therefore, while the concept of Readily Available Talent is real, as its where all the decision-making happens, the actual level really requires different baselines for different uses.  Sorry to make unclear something that lacked clarity to begin with.

Sunday, February 18, 2024

Attribution of Player Performance within the Context of the game

This is how y'all see it...

Poll 1:

Steve Young goes 15-15 for 180 yards in the 1st half. The score is somehow 0-0.

Joe Montana plays the 2nd half, goes 3-15 for 36 yards, with all 3 completions being TD passes. The final score is 21-0.

Between Montana and Young, who gets more of the share for that 49ers win?

2 to 1 for Montana

Poll 2:

Young plays 1st half of every game completing 75-100% of his passes in each, averaging 12 yds per completion. Yet never throws a TD, his RB never score a TD

Montana plays 2nd halfs completing 20-40% of passes, averaging 9 yds per completion. Season total 32 TD

Who is 49ers MVP?

Almost 2 to 1 for Montana

Poll 3:

Wade Boggs goes for 4-4. Each time, with a runner on 1B. Those runners, plus Boggs, were left stranded at end of each inning

Bottom of 9th, PH Jim Rice leads off inning, hits walkoff HR, to make it 1-0 for Redsox

Between Boggs and Rice, who gets more of share for that Sox win?

Nearly 3 to 1 for Rice

Poll 4:

Wade Boggs has season for the ages, hitting over .400, OBP over .500. Yet somehow scored only 80 runs and drove in only 60, while batting 2nd.

Jim Rice was below average in BA, OBP, SLG, yet somehow managed to score 90 runs and drive in 110, while batting 6th.

Who was Sox MVP?

Overwhelmingly for Boggs

Poll 5:

Joe Carter plays double-header.

Game 1: 1-4 with 0 RBI

Game 2: 1-4 with 3 RBI

Everything else about what Carter did was same in both games

How do we give Carter attribution: with or without caring how it *directly* impacted that game?

Do both games get same value for Carter?

Almost 2 to 1 for Game 2

Poll 6:

Jack Black is pro blackjack player, regularly wins 1000-3000$ each day. But on this particular day, he lost 2000$.

Alan is his friend at the next table, hitting when he should stick, sticking when he should hit. And yet on this same day, he ends up winning 3000$.

What say you:

Overwhelmingly for Process over Results

() Comments

Saturday, February 17, 2024

NaiveWAR 2023

Four years ago, in a series of tweets, I introduced NaiveWAR, essentially the simplest uber-metric possible.  I finally coded it up last night. Here are the results for 2023 (click to embiggen).  

True to its name, I used the absolute most minimum information I could, and still give plausible, naive, results.  That data is exclusively limited to players who participated in: Runs, Outs, Plate Appearances. And that is it.  I couldn't have made it any more naive and still give plausible results. Cashmere Ohtani is that red dot.

() Comments

Friday, February 16, 2024

Statcast Lab: Do some batters overswing?

On his 30% weakest swings, LHH Luis Garcia (Nationals) generated 2 runs per 100 swings above average.  On his 30% hardest swings, he generated 7 runs per 100 swings below average.  He led MLB in terms of that gap in performance.  Can we say he overswings?  I don't know, we'd have to look at each of his swings to see why the results came out as they did.  But he clearly performed better when his swings were the weakest.

On the flip side are batters who far far exceeded their performance on their hardest swings compared to their weakest swings.  Among this group are Ohtani and Yordan Alvarez, who are each around 13 runs above average on their hardest swings and 4 runs below average on their weakest swings.  (League average is +0.5 and -5.0 runs per 100 swings, respectively.)

Of course, you have to be careful here, since a batter is going to potentially check his swing (unsuccessfully), and so the swing speed is not necessarily some sort of independent variable to his approach.

Click to embiggen.


UPDATE: Here is the distribution in speed, as well as the run values, for Garcia and Ohtani. Obviously, Ohtani is in blue. At 81+ is when Ohtani is doing the damage. Garcia you can see had some success at under 68. However, given the combo of 67+68 is a net negative, it may very well be that that is just before-the-fact cherry-picking. That said, Garcia at 74+ or 76+ is a net negative, and it may very well be that he overswings.

(2) Comments • 2024/02/16 • Bat_Tracking

Statcast Lab: Swing Speed Distributions by Ball-Strike Count

Only showing 0-2 v 3-0, plus overall.  Click to embiggen

() Comments

Thursday, February 15, 2024

Statcast Lab: Swing Speed Distributions by Pitch Types

(click to embiggen)

(1) Comments • 2024/02/16 • Bat_Tracking

Wednesday, February 14, 2024

Explaining the reasoning behind the construction of OPS+

Suppose the league OBP is .300, and the number of runs scored per game is 4.0

  • If a team's OBP is .330, that is 10% higher than the .300 league, or 110 in OBP+ parlance. So, 10% more runners roughly means 10% more runs. And so, 10% more runs than 4.0 is 4.4. (Assume team SLG matches league SLG.)

Suppose the league SLG is .400, and the number of runs scored per game is 4.0

  • If a team's SLG is .440, that is 10% higher than .400, or 110 in SLG+ parlance. So, 10% more total bases roughly means 10% more runs. And so, 10% more runs than 4.0 is 4.4. (Assume team OBP matches league OBP.)

Now, if you have BOTH 10% more runners AND 10% more total bases, we'll actually end up with roughly 20% more runs.

  • If you do OBP+ plus SLG+ minus 100, you get 110 + 110 - 100 = 120 in OPS+ parlance
  • If you did OPS/lgOPS, you'll get .770/.700 = 1.1 or 110 in OPS+ parlance

What's better/right?

  • To the extent you want to be pedantic, OPS+ in this illustration should be 110. 
  • To the extent what you care about is associating OPS to runs in a 1:1 manner, then OPS+ should be 120.

wRC+ uses the same process in terms of converting wOBA into runs: 200*wOBA/lgWOBA - 100. The only difference is in the name, with wRC+ being clearer as to its intent, and not directly being linked to wOBA by name (even if it is under the hood), other than that lowercase w.

Choose your path.

() Comments

Tuesday, February 13, 2024

The Math behind the NFL OT Playoff Rule

I Cut, You Choose?  It's not exactly that, but it's close to that.

I'm going to come up with some random numbers. I don't follow football enough to give you good numbers, so I'll just try some random numbers.

In this iteration, I'll assume the chance of NOT scoring is 60%. And when you score, it's just as likely you will TD as FG.

So, let's start. Team 1 has the ball, and 20% of the time has 3 points on the board, and 20% of the time they put 7. Now, let's follow each of those three branches, starting with the scoreless one.

  • If Team 2 is also scoreless, it goes into sudden death. We'll assume Team 2 is more likely to score, so let's make it scoreless 55%, and scoring 45%.
  • With the FG branch: we'll assume here Team 2 is more likely to try for the TD. So, scoreless 65% of the time, FG 10%, TD 25%.
  • Finally the TD branch: Team 2 has to be more aggressive, so chance of scoreless is 70%, with 0% for FG, 15% for 6 points (and a loss) and 15% for 8 points (and a win).

The sudden death calculation is a simple calculation. At a 60% scoreless chance for both teams, then it's 62.5% chance for Team 1 to win their sudden death.

All of this now becomes a straightforward probability distribution calculation. And in this illustration, the win% is 52% for Team 1.

Now, what happens if I change the chance of scoreless down to 50%, and adjust everything off that? Now the chance of Team 1 winning is 51%.

If chance of scoreless is down to 40% for any drive, then team 1 winning is 49%.

Indeed, this is how it looks based on the scoreless rate, from 10% to 90%:

So, it is easy enough to see that when you have to input two specific teams, things can change from this baseline, and so what may show here as 47% can in reality be 52%.

That's the baseline. Now, all we need is for someone to come up with something a bit more intricate, and we'll see... probably the same thing.

So, whoever over at NFL ops who came up with this scheme likely proposed this setup because it's around 50/50, all depending on whatever actual teams are involved.

(1) Comments • 2024/02/16

Thursday, February 08, 2024

VOZ - Value Over Zero

Everyone has their own VOZ method, the Value over Zero. The zero-point is the point at which that thing has no value. This is most clearly demonstrated with Fantasy Leagues. If you play Fantasy sports games, congratulations, you have a VOZ method. In a world where you have several hundred players, but only a few hundred will get selected, all the unselected players have a value of zero. You are only going to spend money on players who have value above the zero-baseline.

That zero-baseline is different for every position. A below league-average batter at catcher has value, while the same batting line for a 1B has almost no value. This concept is quite clear in Fantasy sports. It's a little murkier with real baseball players, but it's real nonetheless. All we need to do is establish what that zero-baseline is.

On Twitter, I asked what a 200 IP, 11-11 pitcher was equal to in value, and the most popular response was a 100 IP 8-3 pitcher. Now, follow me here, this is the important part. 11 wins and 11 losses has the exact same value, according to the voters, as someone with 8 wins and 3 losses. (In this illustration, the W/L record is a proxy for a pitcher's overall performance.) Again 11-11 = 8-3. If the two pitchers are equal, then the difference between the two pitchers is zero. In other words, this is what the voters are saying:

11-11 = 8-3 + 3-8

This is obvious, right? 8 wins and 3 losses, plus 3 wins and 8 losses is 11 wins and 11 losses. And since 11-11 = 8-3, then implies that 3-8 = 0

In other words, a pitcher who has 3 wins and 8 losses, or a win% of 3/11, or .273, is worth zero. That is the zero-baseline: .273, at least in this illustration.

A fairly high number actually chose 7-4 as being equal to 11-11. This implies the zero-baseline for this group of folks was 4-7, or a .364 win%.

The smallest group chose 9-2 as being equal to 11-11, which implies a .182 win%.

To summarize: 51% implied .273, 34.5% implied .364, and 14.5% implied .182. Collectively that comes out to .291 win%. In other words, the zero-baseline level, the point at which a player has no value, is a win% of .291. This is what is commonly called the replacement level, but my preferred term is the Readily Available Talent level. And so, value over zero, or in this case Wins Over Zero (WOZ) is set so that we subtract .291 wins per game for every player.

An 11-11 pitcher is compared to a .291 pitcher given 22 decisions. And .291 x 22 is 6.4 wins and 15.6 losses. So, subtracting 11 wins by 6.4 wins is +4.6 wins, or 4.6 WOZ.

And that 8-3 pitcher? Well, .291 given 11 decisions is 3.2 wins and 7.8 losses. And 8 wins minus 3.2 wins is 4.8 wins, or 4.8 WOZ. The 7-4 pitcher has 3.8 WOZ. So, somewhere between 8-3 and 7-4, but closer to 8-3, is where you find your pitcher equivalent to 11-11.

That's how WAR works.

(4) Comments • 2024/03/03

Friday, February 02, 2024

Pull Rate and xwOBA

So Ben Clemens did terrific research on something that comes up every now and every then. And everyone that looks at it comes away with the same conclusion. So, it's good that Ben does this work, but after I comment on this, I'll show you something that is even more important.

The issue is: can't we include the Spray Direction with xwOBA, and not just rely on Launch Speed and Angle? The issue comes down to whether we want to explain the PLAY or the PLAYER. If you want to explain the PLAY, then naturally you need to know the spray direction, since 370 feet pop fly down the line is a HR while 370 feet pop fly straightaway is an easy out.

But do you know why we remove BABIP from a pitcher, and use only FIP? Right, because by and large we care about PLAYERS not individual PLAYS. BABIP contains far more noise than signal, which is why in an all-or-nothing situation, you want to using nothing of BABIP. If you want to weight it, you'd want maybe 20% of BABIP, but that removes the cleanliness of FIP. This is why FIP exists, to provide that clean break. If someone wanted to merge FIP and BABIP, they can do so, giving full weight to FIP and say 20% weight to BABIP.

So, about that spray angle: obviously we have pull hitters and spray hitters. They must have different value right? An xwOBA metric that totally ignores the spray direction must have some bias?

Well, sorta, kinda, if you look at it myopically, and not at all if you look at it holistically.

In Ben's article, he did something very smart, which is break up players into 4 groups based on their spray tendency, from heavy pull to heavy spray. And he did it even smarter by focusing on airballs. A pulled groundball for example is not what we are talkign about in terms of xwOBA missing out on HR down the line.

I asked him for two pieces of information. The first is a summary chart of his last chart for all batters, not just the group he noted. And, you can see a bias here. The pull hitters, when we look at their Air balls, have a .487 wOBA, while the xwOBA was only .473. That's a 15 point shortfall. And we see a larger effect for spray hitters, who, on air balls have a .474 wOBA, while their xwOBA was .492.

So, yes, he did find something. Myopically. Remember, we focused on airballs here. What we care about however is ALL batted balls. Are pull-hitters being biased against by xwOBA because we ignore their spray pattern?

I asked Ben for a chart for ALL batted balls as well. Well, here you go (looks like this is all their plate appearances, but no matter, since the K and BB values are equal in both). That bias shrinks all the way down to 2 or 3 points of wOBA, which is 1 or 2 runs. In other words, this is the FIP/BABIP story. BABIP has a ton of noise that in an all-or-nothing choice, you want to know none of it. And if you want to know some of it, it has really a small weight. And the same applies here: the spray direction has far more noise than signal, and so, you do not want to use it to evaluate players. Unless you severerly underweight that data. And that's why xwOBA doesn't need the spray direction to evaluate PLAYERS.

(2) Comments • 2024/02/07

Draft Function

This is a mostly math post, and I'll be using draft data. If you don't care about either, you won't like this post.

I needed some data. It wasn't important for the purpose of this post what that data is, I just needed to convey the general point that the earlier the round the more value. Anyway, so this was total future WAR by draft round. Again, not important whether this is career WAR, or WAR through age 30, or WAR before reaching free agency, or whatnot. Y'all can do that heavy lifting after I go thru what I want to show.

Ok, no surprise in terms of the general shape, but maybe there's surprise in the steepness? I dunno. Anyway, so the objective is to create a function to connect all those points.

What helps is if we turn all those values into a "share" of the total WAR. In this data, we have 5261 total WAR. Players in the first round have a total of 2613 WAR, which conveniently is almost exactly 50%. Round 2 players have 11%, and it goes down from there. The total is obviously 100%. This is how it looks.

We instinctively knew that a 2nd and 3rd round pick is worth less than a 1st and 4th. Given the choice, we'd take 1+4 over 2+3. This is a good example of where 1+4 <> 2+3. You get a similar thing with exit velocity, where 110+60 is worth more than 90+80.

Indeed, given that the 1st round pick has 50% of all the WAR, this chart suggests that 1 = 2+3+4...+19+20. That's right, having a 1st round pick is worth the same as all other 19 picks combined. I'd bet you didn't know that! Well, at least that's what this data is saying. You gotta tease it to figure out what else it might be saying.

Back to math. When I look at this data, the first place I go to is 1/x. So, it's a question of what constant to put in the numerator, and how to represent the denominator. Let's start with a simple function of: 0.278/Round. This is how that looks.

As you could have guessed, that first round is woefully undervalued by our first attempt. 0.278/1 is obviously 27.8%, and we needed to have 50%. In addition, the dropoff just isn't there either.

Let's try another attempt, this time, instead of x = Round, let's make it Round-squared. The numerator in this case is 0.626, so naturally, the 1st pick will come out to 62.6%. So, the 1st round pick should be somewhere between 1/x and 1/x-squared. However. Look at Round 2. In either scheme, the value is above the data.

So, there's something that is still off. We've been treating Round 1 as a value of 1, and Round 2 as a value of 2. But, what if we made Round 1 a value of 0.5 and Round 2 as a value of 1.5. In other words, the scheme would be 1 / (Round - 0.5) . In this case, the numerator is 0.2. This makes Round 1 worth 40% and Round 2 worth 13.3%. You can see how we're on the right track here.

Read More

(4) Comments • 2024/02/06

Tuesday, January 23, 2024

Statcast Lab: Distance by Launch Angle and Spray Direction

This is league-wide data, 2021-2023. LHH are "mirrored", so that all their pull data is on the left side of the chart, to match RHH. (click to embiggen)

At each launch angle level, the distance is higher the more you pull.  It has more to do with how well a bat is hit more than anything.  

At 28 degree launch for example, distance is maximized when you aim for the LF/CF gap.  The more you hook, or the more you slice, the more likely you mishit the ball (lower speed).  There's also the effect of the spin of the ball (the more you square up, the more likely you have backspin, and the more you mishit, the more sidespin.  Just think of how you golf.)  

Saturday, January 20, 2024

Statcast Lab: Delta-Frequency Maps

I've done Delta Maps using Distances by launch angle x speed and comparing to the league average, or showing wOBA changes year to year along the same lines.

Kyle Bland showed a really snazzy one by using launch angle and some derivation of spray angle, and comparing the frequency of the player (Bo Bichette in this case) to the league average.  It's really nice.  So, I did the same thing, not as nice, but, it is more accurate since I use the actual spray direction, as well as showing the numerical values.   Make no mistake, if I was as talented as Kyle, I'd overlay what I just did with heat maps as well.  I'm not, so I won't.  (Click to embiggen.)

This shows how many batted balls Bo Bichette (2021-2023) has at that particular combination of launch angle and spray direction, compared to the league average.

A few notes here.

  1. The top row is the spray direction, where -45 is 3B foul line and +45 is 1B foul line, with 0 up the middle
  2. The left column is the launch angle from -90 (down the ground) to +90 (straight up), with 0 being horizontal to the ground
  3. Any batted ball short of 10 feet is put into its own short distance basket, labelled above as Chop
  4. Any batted ball that was caught in foul territory is under Foul

So a few more notes:

  • Bichette is a HEAVY groundball batter: in addition to all those reds you see in the grand column at +4, -4, and -12 degrees, there's the huge red of 42 more choppers than league average
  • You can also see the complete lack of popups, at 36 degrees of launch and higher
  • Similar to Kyle snazzy chart, we can see a preponderance of groundballs hit to the 1B side, and a lack of popups to the left field
  • And much fewer foul outs than the league average

So, yeah, Kyle's presentation is brilliant, and we can tell a far better story by showing it relative to the league average.  Thank you Kyle.

UPDATE: Here is Mookie Betts (click to embiggen)

Betts (2021-2023) is a big time flyball hitter. But good flyballs, not popups.  

  • You can also see the complete lack of choppers, having 97 fewer than the league average.  Betts has 149, and the league average is 97 above that, or 246.  
  • He pulls all his line drive and flyballs, and really abandons the right side infield and short outfield.
  • A reminder that 28 degrees of launch angle is where you find most homeruns, though you can get them also at 20 degrees if you pull them enough
(16) Comments • 2024/01/23 • Batted_Ball

Friday, January 19, 2024

Statcast: Theoretical and Actual HR Park Factors

As we know, Coors helps batters tremendously with the carry of the ball, on the order of twenty feet, on 400 foot batted balls.  However, Coors also happens to be the deepest park in MLB, sixteen feet deeper for homeruns.  So, on the one hand, the environment adds 20 feet, while on the other hand, its configuration costs the batter 16 feet.  The net effect is +4 feet.  While that may not sound like much, each foot adds 3% HR, for an estimated +12% HR.  Since 2020, Coors has been at +9%.  So, that's a pretty good match in terms of actual HR being hit compared to expected based on the physical and environmental characteristics.

Here it is for all ballparks with GABP leading the way on one end, and Comerica on the other end (click to embiggen).

(4) Comments • 2024/01/22 • Parks

Statcast: How credible are swing speeds for batters?

A typical batter will have about 1.85 swings per plate appearance, of which 90% are competitive swings (excluding half-swings and failed checked swings, etc). At 600 plate appearances, that comes out to 1000 competitive swings. Suppose you take a random sample of 100 swings? How representative of their true swing speed would that be? As you can imagine, it would be incredibly high. Now, what about 50 swings? 20? 10? What is the credibility level?

What I did was very straightforward: I took 100 random swings for a batter, and correlated to 100 other random swings for that batter. I did that for every batter with at least 200 swings. The correlation came in at r=0.98.

I ran this with 99 swings (for batters with at least 198 swings) and 98 and on and on down to 1 swing (min 2 total swings). Correlation at r=0.95 happened at only 33 swings. Correlation at r=0.90 happened at only 17 swings. Correlation at r=0.80 happened at only 7 swings.

Here's how the chart looks for every point from 1 to 100 swings (those are the blue dots). Click to embiggen.

The orange line is the regression amount, the ballast, the amount of league average swings to add. For you Bayesians out there: that's the prior amount you'd add to the Beta Distribution. As you can see, this number hovers at just under 2 swings. In other words, after 2 swings, the average swing speed of the batter in question is half-real.  We can therefore say the Credibility Level is just under 2 swings.

The dotted line is the Reliability Level: swings / (swings + 1.8). While not as credible as pitch speed, swing speed is not far off.

(3) Comments • 2024/01/22 • Bat_Tracking

Statcast: Amount of Extra Carry at Each MLB Ballpark, 2020-2023

Source data on Savant.  These are essentially how much more or less carry a batted ball hit at 400 feet will have.  (click to embiggen)

Note: each foot affects HR rates by about 3%.

(6) Comments • 2024/01/21

Thursday, January 18, 2024

Statcast: wOBA by Distance

350 feet to 430 feet. This is the key.  

Once you can hit a ball 430 feet, every extra foot is irrelevant.  Hitting a ball 430+ feet is a HR, regardless of distance, and hence the wOBA value of 2.

When you hit a ball under 350 feet, well, adding distance, or SUBTRACTING distance, is about the same. When you hit a ball under 200 feet, every extra foot helps. But once you get to 220 feet, every foot HURTS.  Until you get to 320-330 feet or so.

So, if you look at all batted balls 0 to 350 feet, as a group, it's basically immune to extra or lost distance.  Adding a foot or subtracting a foot doesn't change anything.

The rapid acceleration happens at 350+.

Now, if you follow baseball, you can guess the reason: there's a gap between the infield and outfield.  Infielders play up to 150 feet from home plate, while outfielders play starting at 280 feet from home, up to about 330 feet from home.  So, you can get success between the infield/outfield, or beyond the outfielders (and/or beyond the fence).

When you hit a ball at 95 mph, at the ideal launch angle (roughly 24-32 degrees), that ball will travel about 350 feet.  This is why the Hard Hit rate really starts at around 95mph.  It's not arbitrary.  90 mph is not enough to get you to 350.  And 350 really is a threshold that needs to be cleared.  Naturally, 100 is better than 95 and 105 is better than 100.  Just saying 95+ for hard hit is just a gateway to better understanding Exit Velocity.

And so, when you look at a ball having more or less carry because of wind or any other reason, it's players who hit the ball 350-430 feet that are going to be the most affected.

Monday, January 15, 2024

Statcast Lab: Introducing Catcher Lunges

I love JT Realmuto. And it pleases me to no end that he would come out on top in various catcher metrics I've created during my time here at MLBAM working on Statcast. After I develop metrics, I always check to see how Kiermaier and Betts and Realmuto and so on do. He still comes out very well on throwing and blocking. And up until 2023, he was excellent in framing. 2023 however, he was very different.

When it comes to a very toolsy metric like what we have on Savant, the level of uncertainty is fairly low. Why is that? Because there is very little inference going on. Most fielding metrics, and really any of them pre-tracking, it's all about inferences. But here, we are simply reporting what was being measured or tracked. But, I know that seeing a number like -13 runs, when that is preceded by +7, +3, +4, 0 seems off.

Since I've started looking at the catcher locations on called pitches, I was interested in developing a new metric, Lunges. You know those pitches: the catcher is on one side of the plate, while the pitch is going the wrong way, so the catcher lunges to catch the errant pitch, even if it's in the strike zone.

The best catcher at Lunges (at least on 4-seam fastballs, RHH v RHP) is Matt Thaiss. You can see the description of the data I was going after in the previous article. In this one, I further limited it to pitches where the catcher was located on the inside part of the plate. He faced 21 pitches in the outside part of the strike zone, and all 21 were called strikes. That's well above the 85% for the league average. He caught 57% in the shadow area, where the league average is 38%. All in all, of the 58 pitches in these regions, he caught 34 that were called strike, while the expectation was 27. That's +7. Again, just limited to RHH v RHP on 4-seamers. Eventually, I'll make sure to cover everything.

Realmuto however. He only got 56% strikes on pitches clearly over the plate. When Thaiss gets to 85%, and JT is at 56%, that's certainly alarming. For pitches in the shadow area, JT only got 8% strikes (1 of 12), while Thaiss was at 57%. All in all, JT got 10 called strikes out of 36 pitches, whereas the league average is at 18. That's -8.

Now, I hear you, small sample size. Forget about: I hear you. I say it! You hear me. The larger point is that JT is at -13 runs for the season on all pitches. This is just one snippet to show where JT failed and where Thaiss succeeded. Given that Thaiss was -1 runs overall, this must mean that there was other areas where Thaiss did not do well. Lunges however, is where he did do well.

Now, off to watch some video of these two catchers to see if the eye test matches what we've just learned here.

(2) Comments • 2024/01/16 • Player Tracking
Page 3 of 188 pages  < 1 2 3 4 5 >  Last ›

Latest...

COMMENTS

Nov 23 14:15
Layered wOBAcon

Nov 22 22:15
Cy Young Predictor 2024

Oct 28 17:25
Layered Hit Probability breakdown

Oct 15 13:42
Binomial fun: Best-of-3-all-home is equivalent to traditional Best-of-X where X is

Oct 14 14:31
NaiveWAR and VictoryShares

Oct 02 21:23
Component Run Values: TTO and BIP

Oct 02 11:06
FRV v DRS

Sep 28 22:34
Runs Above Average

Sep 16 16:46
Skenes v Webb: Illustrating Replacement Level in WAR

Sep 16 16:43
Sacrifice Steal Attempt

Sep 09 14:47
Can Wheeler win the Cy Young in 2024?

Sep 08 13:39
Small choices, big implications, in WAR

Sep 07 09:00
Why does Baseball Reference love Erick Fedde?

Sep 03 19:42
Re-Leveraging Aaron Judge

Aug 24 14:10
Science of baseball in 1957

THREADS

November 23, 2024
Layered wOBAcon, for Pitchers

November 19, 2024
Layered wOBAcon

October 24, 2024
Layered Hit Probability breakdown

October 18, 2024
Describing the season of a pitcher: ERA v FIP

October 04, 2024
Binomial fun: Best-of-3-all-home is equivalent to traditional Best-of-X where X is

October 02, 2024
Component Run Values: TTO and BIP

October 01, 2024
Cy Young Predictor 2024

September 25, 2024
Arm and Movement Angles

September 25, 2024
Runs Above Average

September 23, 2024
Clase and Cy Young voting for relievers

September 20, 2024
Spray angles and xwOBA, part I-lost-track

September 17, 2024
FRV v DRS

September 15, 2024
Sacrifice Steal Attempt

September 11, 2024
Skenes v Webb: Illustrating Replacement Level in WAR

September 06, 2024
Small choices, big implications, in WAR