Monday, December 01, 2014
Lessons from Bill James and MGL
I have a pretty crazy memory system. I might forget something very recent, which explains why I haven't bought new light bulbs for two weeks already. But I also remember stuff I read from 30 years ago, which is why I can remember almost everything Bill wrote. It's embarrassing, but I remember his research more than my own, even my most recent stuff.
Anyway, one thing that Bill said was something like "don't multiply or divide A by B unless there's a gosh-darn good reason to do so". Not only multiplication, but also addition and subtraction.
Just last night, during the Grey Cup, I was handed a wonderfully compiled dataset from WAR-on-Ice. One of the stats that has taken hold among the hockey followers is "Corsi" and "Fenwick", which are silly names that simply means "Shot Differential" or "Shot Ratio". The distinction between the two is the kinds of shots that are included. But, all shots are treated the same. That is, all shots are added together, in an UNWEIGHTED fashion. This would be like coming up with a stat called "Batting Average", and making no distinction between a single and HR. Or coming up with a stat called "On Base Percentage", and making no distinction between a walk and HR. This is why Slugging Average is superior to Batting Average. And this is why Weighted On Base Average (wOBA) exists to be superior to all of them.
So, don't add numbers just because you can. Figure out WHY and HOW they need to be added. And since goals contain (much much) more information than non-goal shots, then clearly, we can't just go ahead an add goals to non-goals in an unweighted manner. Well, you CAN if all you care about is "possession time". That's (probably) a good way to do it. But, more important than possession time is QUALITY of possession time.
That's a basic lesson from Bill. MGL said something important as well, again paraphrasing: since no two things can possibly be exactly equal, then you have to figure out in which direction you have to move something to make them come close to being equal. It should be obvious that a goal and non-goal shot aren't EXACTLY equal. So, if you had to guess which of the two you would weight more than the other, which would it be? Would you weight the goal more or the non-goal shot more? Right, it's obvious, the goal has to get more weight. Once you accept that, the search is on.
The question that I always have, the question that guides me, and really the basis of all my research is: to what DEGREE is something true. I accept as a matter of fact that a clutch skill exists. Why? Because nothing is exactly truly random, when it comes to dealing with humans.? Heck, even with machines. But that is irrelevant. What we care about is the degree to which something exists and the degree to which is can be measured as having an impact. The clutch skill exists to the point where it can be an actionable item as a tie-breaker. That's pretty much all it is. If you have two hitters who are overall equal, but one is a LHH and the other is a RHH, the LHH can have the worst clutch skill and the RHH can be the king of clutch, but if a RHP is on the mound, it's the LHH that you send out. (Presuming normal hand-split skills for all concerned.)
So, the search is on for goals and non-goal-shots. And then within the subset of non-goal-shots, can we weight them differently? As a case for further advancement, we'd want to know how far from the net the non-goal shots were. Heck, even for the goals, we'd want to know that.
And the same applies for basketball. You have the same kind of events with basketball that you have with hockey. What correlates to future point differential, and to what degree do they correlate. What you do NOT want to do is correlate the stats to CURRENT point differential. That's because there's an inherent "x = x" kind of correlation to deal with. This is (probably) why something like Wins Produced gets slammed. In order to test a metric, you need to test it out of sample. But, I don't know enough about this particular metric to say anything more.
So, for you soccer and football (and basketball and hockey) researchers out there: I'd like to see your research along the lines I just did for hockey. What does predict future scoring differentials, and how much do you have to weight the various events? (And focus on teams, rather than players. Players will come next.)
Recent comments
Older comments
Page 1 of 151 pages 1 2 3 > Last ›Complete Archive – By Category
Complete Archive – By Date
FORUM TOPICS
Jul 12 15:22 MarcelsApr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref
Apr 12 09:43 What if baseball was like survivor? You are eliminated ...
Nov 24 09:57 Win Attribution to offense, pitching, and fielding at the game level (prototype method)
Jul 13 10:20 How to watch great past games without spoilers