[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
THE BOOK cover
The Unwritten Book
is Finally Written!

Read Excerpts & Reviews
E-Book available
as Amazon Kindle or
at iTunes for $9.99.

Hardcopy available at Amazon
SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
Shop Amazon & Support This Blog
RECENT FORUM TOPICS
Jul 12 15:22 Marcels
Apr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref

Advanced

Tangotiger Blog

A blog about baseball, hockey, life, and whatever else there is.

Thursday, December 28, 2023

Improving WAR - Resolving DIPS (part 1)

Twenty years ago, Voros shook the saber community with one of the most important saber discovery to that point, and still a top ten discovery of all saber-time. He called it DIPS, or Defense-Independent Pitching Statistics. My tiny contribution to that was FIP, which is merely a shortcut to the full-fledged DIPS. Had I not invented FIP, Voros would have eventually created it anyway.

The illustrations that Voros provided was extremely compelling. In 1999 and 2000, Pedro Martinez had perhaps the greatest stretch of two pitching seasons ever, in the history of baseball. It's difficult to even decide which of the two seasons was the better one. His ERAs were 2.07 and 1.74, and this is in the middle of the high scoring era. He had 313 strikeouts in one of the seasons and 284 in the other. And this is while pitching only 213 and 217 innings each season. In the season where he gave up 32 more hits, he also gave up 8 fewer HR. All in all, it's hard to decide which of the two seasons were better, and in any case, the two stood together as perhaps the best pitching seasons back to back.

What did Voros point out? If you remove the strikeouts and homeruns, and compared the non-HR hits to all remaining batted balls, what he called BABIP (batting average on balls in play), Pedro had among the league-low of .236 one season and among the league-high of .323 in the other season. This seemed ridiculous on its face. How could perhaps the greatest pitcher ever, having one of his two best pitching seasons ever, allowed hits on balls-in-play at a close to league-high rate? And how did he pair that up with a league-low rate in the other season?

This would suggest that allowing non-HR hits on balls-in-play might be pretty random. After all, Pedro would not pair a league-leading strikeout one season with a league-low strikeout another season and STILL be one of the best pitchers ever. You couldn't do that with walks either, or homeruns. It just doesn't work like that. But, non-HR hits on balls-in-play? Well, it happened. And it wasn't just Pedro either. While pitchers had a fairly stable SO, BB, HR year to year, their BABIP fluctuated greatly.

In retrospect, we should have known. Because Random Variation would have told us. But, no one ever looked, not until Voros. The key point of his discovery is that Voros created the denominator: balls in play. That was the key. Once that was done, then you could apply basic statistical principles to determine how much Random Variation could have impacted BABIP. Assuming 500 balls in play, then one standard deviation was roughly 0.46 divided by root-500 or 20 points. Two standard deviations is 40 points. So, going from 2 standard deviations worse than average to 2 standard deviations better than average is not that noteworthy from a performance standpoint. Look hard enough, and someone will do that year after year. In 1999-2000, that just happened to be Pedro. Even Pedro was subject to Random Variation.

Still, what do you do with this information, that Pedro had a .323 and .236 BABIP in back to back seasons? This is where you get into ATTRIBUTION and IDENTIFICATION. Suppose that pitching was done via pitching machines. And through Random Variation, you will end up with some games with 3 hits and other games with 13 hits. Nothing changes. It's the same machine, the same opposing batters, the same fielding alignment. Nothing changes. Except, because of Random Variation, you will get a random result of hits. We've identified the entity on the mound (Pitching Machine 4587). But do we attribute the results to that machine? Or, is the machine simply inconsequential?

Now, humans are different: they are humans. And when it comes to human behaviour and human talent, they can influence results. Now, just because they can influence SOME of the results, doesn't mean they can influence ALL the results. We can identify who the pitcher is on the mound, but do we attribute everything that happens to the pitcher? After all, we have human fielders involved, and we have the vagaries of the park and weather that day. The batters change, and heck, every ball is like a snowflake: no two balls are alike.

Just because we've identified Pedro, and we've calculated a BABIP of .323 one season and .236 another season doesn't mean we attribute all of that to Pedro. There's other entities involved here. Pedro cannot possibly absorb all those outcomes, given that he's one influence.

At the time twenty years ago, I was involved in a discussion and research called Solving DIPS, which basically determined, through basic statistical principles, that Random Variation was the large agent, while the pitcher and fielders were also significant agents, as was the park.

Next up: we'll set aside all that theory and look at things more factually.

Latest...

COMMENTS

Nov 23 14:15
Layered wOBAcon

Nov 22 22:15
Cy Young Predictor 2024

Oct 28 17:25
Layered Hit Probability breakdown

Oct 15 13:42
Binomial fun: Best-of-3-all-home is equivalent to traditional Best-of-X where X is

Oct 14 14:31
NaiveWAR and VictoryShares

Oct 02 21:23
Component Run Values: TTO and BIP

Oct 02 11:06
FRV v DRS

Sep 28 22:34
Runs Above Average

Sep 16 16:46
Skenes v Webb: Illustrating Replacement Level in WAR

Sep 16 16:43
Sacrifice Steal Attempt

Sep 09 14:47
Can Wheeler win the Cy Young in 2024?

Sep 08 13:39
Small choices, big implications, in WAR

Sep 07 09:00
Why does Baseball Reference love Erick Fedde?

Sep 03 19:42
Re-Leveraging Aaron Judge

Aug 24 14:10
Science of baseball in 1957

THREADS

December 28, 2023
Improving WAR - Resolving DIPS (part 1)