Tangotiger Blog

Saturday, January 03, 2015

Overfitting

By Tangotiger

?This fellow tries to simplify the algorithm to estimate who will make the HOF. However, because he only works with backwards data, he offers NO opportunity to test with out of sample data. That leads to massive overfitting. How massive? His first rule for hitters is that he will make the HOF if he scores more than 1197 runs (at an 88% success rate). Steve Finley is at 1443. He'll never make it. Neither will Luis Gonzalez, nor Bernie Williams, Brett Butler, Darrell Evans, Tony Phillips, Julio Franco, Dave Parker, Ray Durham, Chili Davis, Don Baylor, and Edgar Renteria. And a host of others. Could a first-pass have excluded this guys before using Runs as the first line of defense? Sure, you start with WAR, which the author intentionally ignored. What you REALLY want to do in these tests is identify the most important variables first, those that lead to automatics, and then worry about the nuances later.

You can see this with his first two tests for pitchers: Wins over 229 and years LESS than 25. That was an obvious overfit, when the more natural fit would have been to start with wins over 299. And I doubt it's number of years, but maybe number of losses or W/L differential or win%, something more real and natural.

Finally: start with the Bill James Hall of Fame monitor. That would have been PERFECT for the researcher to have used, as Bill provides a series of well thought-out and reasonable tests. The researcher could then try to reduce it as much as he could, and still maintain a more reasonable output than Julio Franco predicted for the Hall of Fame, and Nolan Ryan not.

(31) Comments • 2015/01/23 • Statistical_Theory

Recent comments

Feb 19 11:05		Bat-Tracking: Timing Early/Late
Feb 07 15:38		Aging Curve - Swing Speed
Feb 06 11:55		Batting Average as a proxy for fun! Batting Average as a proxy for fun?
Feb 03 20:21		Valuation implication of straying from the .300 win% replacement level
Jan 31 13:35		Breaking into the Sports Industry WITHOUT learning to code
Jan 26 16:27		Statcast: Update to Catcher Framing
Jan 19 15:02		Young players don’t like the MLB pay scale, while veteran stars love it
Jan 14 23:32		Statcast Lab: Distance/Time Model to Catcher Throwing Out Runners
Jan 07 13:54		How can you measure pitch speed by counting frames?
Jan 02 17:43		Run Value with runners on base v bases empty
Dec 28 13:56		Run Values of Pitches: Final v Intermediate
Dec 27 13:56		Hall of Fame voting structure problem
Dec 23 19:24		What does Andre Pallante know about the platoon disadvantage that everyone else does not?
Dec 21 14:02		Run Values by Movement and Arm Angles
Dec 18 20:45		Should a batter have a steeper or flatter swing (part 2)?
Dec 18 16:19		Art and Science of WAR: Deriving the zero-baseline, historically
Dec 14 23:50		Art and Science of WAR: Positional Adjustments
Dec 10 12:49		Fine and Notso-Fine Starts
Dec 06 21:59		To login to this site, and register an account (part 2)
Dec 03 23:26		The One-Hour Hall of Fame Points System
Dec 02 08:47		DH and PH Batting Human Adjustment
Nov 23 14:15		Layered wOBAcon
Nov 22 22:15		Cy Young Predictor 2024
Oct 28 17:25		Layered Hit Probability breakdown
Oct 15 13:42		Binomial fun: Best-of-3-all-home is equivalent to traditional Best-of-X where X is
Older comments Page 1 of 152 pages 1 2 3 > Last ›
Complete Archive – By Category Complete Archive – By Date 2025 Jan Feb 2024 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2023 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2022 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2021 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2020 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2019 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2018 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2017 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2016 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2015 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2014 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2013 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec FORUM TOPICS Jul 12 15:22 Marcels Apr 16 14:31 Pitch Count Estimators Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS Jan 29 09:41 NFL Overtime Idea Jan 22 14:48 Weighting Years for NFL Player Projections Jan 21 09:18 positional runs in pythagenpat Oct 20 15:57 DRS: FG vs. BB-Ref Apr 12 09:43 What if baseball was like survivor? You are eliminated ... Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method) Jul 13 10:20 How to watch great past games without spoilers

Tangotiger Blog

Saturday, January 03, 2015

Overfitting

Recent comments

Older comments

Complete Archive – By Category

Complete Archive – By Date

FORUM TOPICS

Latest...