Saturday, January 03, 2015
Overfitting
?This fellow tries to simplify the algorithm to estimate who will make the HOF. However, because he only works with backwards data, he offers NO opportunity to test with out of sample data. That leads to massive overfitting. How massive? His first rule for hitters is that he will make the HOF if he scores more than 1197 runs (at an 88% success rate). Steve Finley is at 1443. He'll never make it. Neither will Luis Gonzalez, nor Bernie Williams, Brett Butler, Darrell Evans, Tony Phillips, Julio Franco, Dave Parker, Ray Durham, Chili Davis, Don Baylor, and Edgar Renteria. And a host of others. Could a first-pass have excluded this guys before using Runs as the first line of defense? Sure, you start with WAR, which the author intentionally ignored. What you REALLY want to do in these tests is identify the most important variables first, those that lead to automatics, and then worry about the nuances later.
You can see this with his first two tests for pitchers: Wins over 229 and years LESS than 25. That was an obvious overfit, when the more natural fit would have been to start with wins over 299. And I doubt it's number of years, but maybe number of losses or W/L differential or win%, something more real and natural.
Finally: start with the Bill James Hall of Fame monitor. That would have been PERFECT for the researcher to have used, as Bill provides a series of well thought-out and reasonable tests. The researcher could then try to reduce it as much as he could, and still maintain a more reasonable output than Julio Franco predicted for the Hall of Fame, and Nolan Ryan not.
Recent comments
Older comments
Page 1 of 152 pages 1 2 3 > Last ›Complete Archive – By Category
Complete Archive – By Date
FORUM TOPICS
Jul 12 15:22 MarcelsApr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref
Apr 12 09:43 What if baseball was like survivor? You are eliminated ...
Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method)
Jul 13 10:20 How to watch great past games without spoilers