Forecast 2003 - Part 2The individual picksBy Tangotiger, with Alan Jordan
Read part 1 first.
In this installment, we'll take a look at the individual picks. Our pool was based on 32 hard-to-forecast players. Because of limited playing time, 4 players have been dropped from the pool (Giambi, Park, Wright, Ritchie). That leaves us with 28 players. The next step is to convert the OPS and ERA into a common unit. To do that, we divide the OPS figures by 0.12 (the standard deviation among league hitters' OPS is around .12, while the standard deviation of pitchers' ERA is around 1.0). This will put the OPS in-line with the ERA. We then calculate the average of all absolute values of the differences between the picks and the actual performance.
For those who had a hard time following that, let's take an example. Say that Derek Jeter's OPS was .840 in a league OPS of .770. Say also that you predicted Jeter's OPS to be .890 in a league of .760. So, his actual was +.070, and his prediction was +.130. That means that Jeter's prediction was off by .060 OPS. Divide this figure by .12 and you get 0.50. When you look at the "0.50" number, think of it as Jeter's prediction was off by 0.50 ERA. Repeat this with all players in the pool, and find the average.
Before we get into the individual results, let's take a look at this chart:
The "Accuracy" is simply the figure we derived based on the above process. The lower the number, the better. Each of the individual black lines is the reader level of accuracy, all 165 of them. Each of the 6 forecasters are represented by the red line. As you can see, 4 of them were extremely close to each other. The baseline was right in the middle of the individual forecasters. The readers as a group were also very close to the individual forecasters. The difference between the #1 and #4 forecasters was .01 units. As noted earlier, consider a unit to be equivalent to ERA.
Seven readers beat the best of the forecasters, while 111 lost to the worst of the forecasters. 11 beat the baseline ("the monkey pick") and 154 lost to the baseline. The lesson for readers, as individuals, is to not trust your instincts, and simply go with the baseline (the "monkey pick"). The readers, as a group, came in at 26th place, with a .694 score, which again is worse than the baseline, though only slightly worse.
Twenty individual readers beat the readers' consensus, while 145 lost against the consensus. The lesson here is that the "market" has good instincts in making its selection.
Statistical Significance
Can the forecastors beat the monkey?
The results from this sample
suggest no. The monkey (baseline) had an average error of .679 while the
forecastors had an mean error of .693. The mean error for the
forecastors is a little higher, but it's not statistically significantly worse
p less than .27 (t=1.2, df=6).
Can the readers beat the monkey?
The results from this sample suggest
that not only can the average reader not beat the monkey, the monkey
beats the average reader. The average reader had an mean error of .788
which is statistically significantly worse than .679 of the monkey p
less than .0001 (t=15.4, df=165).
Can the average forecastor beat the average reader?
The results from
this sample suggest that yes the forecastors are on average better than
the readers. The mean error for the readers, .788 is higher than the
mean error of the forecastors, .693 and the difference is statistically
significant p less than .0001 (t=-7.0, df=9.4) Note a t-test for unequal
variances was used because the forecastors have a smaller variance and the
difference in variance was significant at p less than .0152 (f=10.3, df=5,164).
Can the forecastors beat the Readersgroup?
The results from this
sample suggest no.
The readersgroup had an average error of .694 while the forecastors had
mean error of .693. The mean error for the forecastors is a little
lower, but
it's not statistically significantly better p less than .956 (t=-.1, df=6).
Whats the difference between the Readersgroup and the average Reader:
how can there be two different
means for one group?
The mean for readers is a straight mean of the
individual readers' errors,
while readersgroup is the error of the readers after their forecasts
have been averaged together
to form one super forecast. By averaging the forecasts of all readers,
the readers who overestimate
and those who underestimate tend to cancel each other out. As long as
the readers have a wide
spread of forecasts, but the mean of their forecasts is close to the
actual values then the error of the
readersgroup will be smaller than that of the average reader.
If the forecastors are better, why are the top seven finishers readers?
This is simply because the readers had more chances to win, 165 out of 171. If all
of the readers and forecastors were exactly equal in their ability to
forecast, then the odds of any reader beating all of the forecastors
would be 1 out of 7 and we would expect about 23 readers to beat all of the
forecastors. We only got 7 readers that beat all the forecastors and
that's below what we would expect. Also the odds of the poll being won
by a reader if everyone is equal are 165/171 or 96%. Therefore while
congratulations are in order, we will have to wait until next year's
poll to see if their rank on this one is due to skill or just luck.
The Champions
So, who performed the best? Here are the leaders:
Starting from the best of the forecasters (Silver) to the 4th best (Palmer), there is only a .015 difference. From Andrea Trento, the winner, to Silver, there is a .069 difference. So, we have a clustering around the forecasters. I'm going to relist the above to show this effect.
How close are these 4 individual forecasters? Looking at the picks between Silver and Palmer, the typical difference was about .363 units (or ERA). However, when looking at 28 players, the differences almost completely wash away to a difference of .015. When looking at between the best and worst forecaster, the average difference was only .227 units, with an overall average of .071. This suggests that the worst forecaster was probably consistently off one way.
All the individual forecasters did a good job, and there's really not much to pick between them. Individual readers should not be trusted, though as a group, they are very intuitive. Go with the monkey. There's little accuracy to be gained beyond that.
Related Links