[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
THE BOOK cover
The Unwritten Book
is Finally Written!

Read Excerpts & Reviews
E-Book available
as Amazon Kindle or
at iTunes for $9.99.

Hardcopy available at Amazon
SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
Shop Amazon & Support This Blog
RECENT FORUM TOPICS
Jul 12 15:22 Marcels
Apr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref

Advanced

Tangotiger Blog

A blog about baseball, hockey, life, and whatever else there is.

Monday, December 01, 2014

Lessons from Bill James and MGL

I have a pretty crazy memory system.  I might forget something very recent, which explains why I haven't bought new light bulbs for two weeks already.  But I also remember stuff I read from 30 years ago, which is why I can remember almost everything Bill wrote.  It's embarrassing, but I remember his research more than my own, even my most recent stuff.

Anyway, one thing that Bill said was something like "don't multiply or divide A by B unless there's a gosh-darn good reason to do so".  Not only multiplication, but also addition and subtraction.

Just last night, during the Grey Cup, I was handed a wonderfully compiled dataset from WAR-on-Ice.  One of the stats that has taken hold among the hockey followers is "Corsi" and "Fenwick", which are silly names that simply means "Shot Differential" or "Shot Ratio".  The distinction between the two is the kinds of shots that are included.  But, all shots are treated the same.  That is, all shots are added together, in an UNWEIGHTED fashion.  This would be like coming up with a stat called "Batting Average", and making no distinction between a single and HR.  Or coming up with a stat called "On Base Percentage", and making no distinction between a walk and HR.  This is why Slugging Average is superior to Batting Average.  And this is why Weighted On Base Average (wOBA) exists to be superior to all of them.

So, don't add numbers just because you can.  Figure out WHY and HOW they need to be added.  And since goals contain (much much) more information than non-goal shots, then clearly, we can't just go ahead an add goals to non-goals in an unweighted manner.  Well, you CAN if all you care about is "possession time".  That's (probably) a good way to do it.  But, more important than possession time is QUALITY of possession time.

That's a basic lesson from Bill.  MGL said something important as well, again paraphrasing: since no two things can possibly be exactly equal, then you have to figure out in which direction you have to move something to make them come close to being equal.  It should be obvious that a goal and non-goal shot aren't EXACTLY equal.  So, if you had to guess which of the two you would weight more than the other, which would it be?  Would you weight the goal more or the non-goal shot more?  Right, it's obvious, the goal has to get more weight.  Once you accept that, the search is on. 

The question that I always have, the question that guides me, and really the basis of all my research is: to what DEGREE is something true.  I accept as a matter of fact that a clutch skill exists.  Why?  Because nothing is exactly truly random, when it comes to dealing with humans.?  Heck, even with machines.  But that is irrelevant.  What we care about is the degree to which something exists and the degree to which is can be measured as having an impact.  The clutch skill exists to the point where it can be an actionable item as a tie-breaker.  That's pretty much all it is.  If you have two hitters who are overall equal, but one is a LHH and the other is a RHH, the LHH can have the worst clutch skill and the RHH can be the king of clutch, but if a RHP is on the mound, it's the LHH that you send out.  (Presuming normal hand-split skills for all concerned.)

So, the search is on for goals and non-goal-shots.  And then within the subset of non-goal-shots, can we weight them differently?  As a case for further advancement, we'd want to know how far from the net the non-goal shots were.  Heck, even for the goals, we'd want to know that.

And the same applies for basketball.  You have the same kind of events with basketball that you have with hockey.  What correlates to future point differential, and to what degree do they correlate.  What you do NOT want to do is correlate the stats to CURRENT point differential.  That's because there's an inherent "x = x" kind of correlation to deal with.   This is (probably) why something like Wins Produced gets slammed.  In order to test a metric, you need to test it out of sample.  But, I don't know enough about this particular metric to say anything more.

So, for you soccer and football (and basketball and hockey) researchers out there: I'd like to see your research along the lines I just did for hockey.  What does predict future scoring differentials, and how much do you have to weight the various events?  (And focus on teams, rather than players.  Players will come next.)

Latest...

COMMENTS

Nov 23 14:15
Layered wOBAcon

Nov 22 22:15
Cy Young Predictor 2024

Oct 28 17:25
Layered Hit Probability breakdown

Oct 15 13:42
Binomial fun: Best-of-3-all-home is equivalent to traditional Best-of-X where X is

Oct 14 14:31
NaiveWAR and VictoryShares

Oct 02 21:23
Component Run Values: TTO and BIP

Oct 02 11:06
FRV v DRS

Sep 28 22:34
Runs Above Average

Sep 16 16:46
Skenes v Webb: Illustrating Replacement Level in WAR

Sep 16 16:43
Sacrifice Steal Attempt

Sep 09 14:47
Can Wheeler win the Cy Young in 2024?

Sep 08 13:39
Small choices, big implications, in WAR

Sep 07 09:00
Why does Baseball Reference love Erick Fedde?

Sep 03 19:42
Re-Leveraging Aaron Judge

Aug 24 14:10
Science of baseball in 1957

THREADS

December 01, 2014
Lessons from Bill James and MGL