[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Forecast 2003 - Part 2

The individual picks

By Tangotiger, with Alan Jordan

Read part 1 first.

In this installment, we'll take a look at the individual picks. Our pool was based on 32 hard-to-forecast players. Because of limited playing time, 4 players have been dropped from the pool (Giambi, Park, Wright, Ritchie). That leaves us with 28 players. The next step is to convert the OPS and ERA into a common unit. To do that, we divide the OPS figures by 0.12 (the standard deviation among league hitters' OPS is around .12, while the standard deviation of pitchers' ERA is around 1.0). This will put the OPS in-line with the ERA. We then calculate the average of all absolute values of the differences between the picks and the actual performance.

For those who had a hard time following that, let's take an example. Say that Derek Jeter's OPS was .840 in a league OPS of .770. Say also that you predicted Jeter's OPS to be .890 in a league of .760. So, his actual was +.070, and his prediction was +.130. That means that Jeter's prediction was off by .060 OPS. Divide this figure by .12 and you get 0.50. When you look at the "0.50" number, think of it as Jeter's prediction was off by 0.50 ERA. Repeat this with all players in the pool, and find the average.

Before we get into the individual results, let's take a look at this chart:

The "Accuracy" is simply the figure we derived based on the above process. The lower the number, the better. Each of the individual black lines is the reader level of accuracy, all 165 of them. Each of the 6 forecasters are represented by the red line. As you can see, 4 of them were extremely close to each other. The baseline was right in the middle of the individual forecasters. The readers as a group were also very close to the individual forecasters. The difference between the #1 and #4 forecasters was .01 units. As noted earlier, consider a unit to be equivalent to ERA.

Seven readers beat the best of the forecasters, while 111 lost to the worst of the forecasters. 11 beat the baseline ("the monkey pick") and 154 lost to the baseline. The lesson for readers, as individuals, is to not trust your instincts, and simply go with the baseline (the "monkey pick"). The readers, as a group, came in at 26th place, with a .694 score, which again is worse than the baseline, though only slightly worse.

Twenty individual readers beat the readers' consensus, while 145 lost against the consensus. The lesson here is that the "market" has good instincts in making its selection.

Statistical Significance

Can the forecastors beat the monkey?

The results from this sample suggest no. The monkey (baseline) had an average error of .679 while the forecastors had an mean error of .693. The mean error for the forecastors is a little higher, but it's not statistically significantly worse p less than .27 (t=1.2, df=6).

Can the readers beat the monkey?

The results from this sample suggest that not only can the average reader not beat the monkey, the monkey beats the average reader. The average reader had an mean error of .788 which is statistically significantly worse than .679 of the monkey p less than .0001 (t=15.4, df=165).

Can the average forecastor beat the average reader?

The results from this sample suggest that yes the forecastors are on average better than the readers. The mean error for the readers, .788 is higher than the mean error of the forecastors, .693 and the difference is statistically significant p less than .0001 (t=-7.0, df=9.4) Note a t-test for unequal variances was used because the forecastors have a smaller variance and the difference in variance was significant at p less than .0152 (f=10.3, df=5,164).

Can the forecastors beat the Readersgroup?

The results from this sample suggest no. The readersgroup had an average error of .694 while the forecastors had an mean error of .693. The mean error for the forecastors is a little lower, but it's not statistically significantly better p less than .956 (t=-.1, df=6).

Whats the difference between the Readersgroup and the average Reader: how can there be two different means for one group?

The mean for readers is a straight mean of the individual readers' errors, while readersgroup is the error of the readers after their forecasts have been averaged together to form one super forecast. By averaging the forecasts of all readers, the readers who overestimate and those who underestimate tend to cancel each other out. As long as the readers have a wide spread of forecasts, but the mean of their forecasts is close to the actual values then the error of the readersgroup will be smaller than that of the average reader.

If the forecastors are better, why are the top seven finishers readers?

This is simply because the readers had more chances to win, 165 out of 171. If all of the readers and forecastors were exactly equal in their ability to forecast, then the odds of any reader beating all of the forecastors would be 1 out of 7 and we would expect about 23 readers to beat all of the forecastors. We only got 7 readers that beat all the forecastors and that's below what we would expect. Also the odds of the poll being won by a reader if everyone is equal are 165/171 or 96%. Therefore while congratulations are in order, we will have to wait until next year's poll to see if their rank on this one is due to skill or just luck.

The Champions

So, who performed the best? Here are the leaders:

Andrea Trento	 0.598
Walt Davis       0.605
Bill the Spill	 0.620
Michael C Jordan 0.632
Geoff Braine	 0.641
Scot Hughes      0.642
John Church      0.656
Forecast - Silver0.667
chris needham	 0.671
Joe Hurley       0.671
Nick Warino      0.672
Forecast - DMB 	 0.675
Brandon Feinen	 0.677
Forecast - ZiPS  0.679
BASELINE         0.679
David Smyth      0.681
Forecast - Palmer0.682
J.P. Gelb        0.682
Nick Shuman      0.682
D from D         0.683

Starting from the best of the forecasters (Silver) to the 4th best (Palmer), there is only a .015 difference. From Andrea Trento, the winner, to Silver, there is a .069 difference. So, we have a clustering around the forecasters. I'm going to relist the above to show this effect.

 0.598 	Andrea Trento
 0.599
 0.600
 0.601
 0.602
 0.603
 0.604
 0.605 	Walt Davis
 0.606
 0.607
 0.608
 0.609
 0.610
 0.611
 0.612
 0.613
 0.614
 0.615
 0.616
 0.617
 0.618
 0.619
 0.620 	Bill the Spill
 0.621
 0.622
 0.623
 0.624
 0.625
 0.626
 0.627
 0.628
 0.629
 0.630
 0.631
 0.632 	Michael C Jordan
 0.633
 0.634
 0.635
 0.636
 0.637
 0.638
 0.639
 0.640
 0.641 	Geoff Braine
 0.642 	Scot Hughes
 0.643
 0.644
 0.645
 0.646
 0.647
 0.648
 0.649
 0.650
 0.651
 0.652
 0.653
 0.654
 0.655
 0.656 	John Church
 0.657
 0.658
 0.659
 0.660
 0.661
 0.662
 0.663
 0.664
 0.665
 0.666
 0.667 	Forecast - Silver
 0.668
 0.669
 0.670
 0.671 	chris needham	Joe Hurley
 0.672 	Nick Warino
 0.673
 0.674
 0.675 	Forecast - DMB
 0.676
 0.677 	Brandon Feinen
 0.678
 0.679 	Forecast - ZiPS, Baseline
 0.680
 0.681 	David Smyth
 0.682 	Forecast - Palmer, J.P. Gelb, Nick Shuman
 0.683 	D from D

How close are these 4 individual forecasters? Looking at the picks between Silver and Palmer, the typical difference was about .363 units (or ERA). However, when looking at 28 players, the differences almost completely wash away to a difference of .015. When looking at between the best and worst forecaster, the average difference was only .227 units, with an overall average of .071. This suggests that the worst forecaster was probably consistently off one way.

Conclusion

All the individual forecasters did a good job, and there's really not much to pick between them. Individual readers should not be trusted, though as a group, they are very intuitive. Go with the monkey. There's little accuracy to be gained beyond that.


Related Links
Baseball HQ
DMB
PECOTA
ZiPS