[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
THE BOOK cover
The Unwritten Book
is Finally Written!

Read Excerpts & Reviews
E-Book available
as Amazon Kindle or
at iTunes for $9.99.

Hardcopy available at Amazon
SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
Shop Amazon & Support This Blog
RECENT FORUM TOPICS
Jul 12 15:22 Marcels
Apr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref

Advanced

Tangotiger Blog

A blog about baseball, hockey, life, and whatever else there is.

Monday, December 01, 2014

Lessons from Bill James and MGL

I have a pretty crazy memory system.  I might forget something very recent, which explains why I haven't bought new light bulbs for two weeks already.  But I also remember stuff I read from 30 years ago, which is why I can remember almost everything Bill wrote.  It's embarrassing, but I remember his research more than my own, even my most recent stuff.

Anyway, one thing that Bill said was something like "don't multiply or divide A by B unless there's a gosh-darn good reason to do so".  Not only multiplication, but also addition and subtraction.

Just last night, during the Grey Cup, I was handed a wonderfully compiled dataset from WAR-on-Ice.  One of the stats that has taken hold among the hockey followers is "Corsi" and "Fenwick", which are silly names that simply means "Shot Differential" or "Shot Ratio".  The distinction between the two is the kinds of shots that are included.  But, all shots are treated the same.  That is, all shots are added together, in an UNWEIGHTED fashion.  This would be like coming up with a stat called "Batting Average", and making no distinction between a single and HR.  Or coming up with a stat called "On Base Percentage", and making no distinction between a walk and HR.  This is why Slugging Average is superior to Batting Average.  And this is why Weighted On Base Average (wOBA) exists to be superior to all of them.

So, don't add numbers just because you can.  Figure out WHY and HOW they need to be added.  And since goals contain (much much) more information than non-goal shots, then clearly, we can't just go ahead an add goals to non-goals in an unweighted manner.  Well, you CAN if all you care about is "possession time".  That's (probably) a good way to do it.  But, more important than possession time is QUALITY of possession time.

That's a basic lesson from Bill.  MGL said something important as well, again paraphrasing: since no two things can possibly be exactly equal, then you have to figure out in which direction you have to move something to make them come close to being equal.  It should be obvious that a goal and non-goal shot aren't EXACTLY equal.  So, if you had to guess which of the two you would weight more than the other, which would it be?  Would you weight the goal more or the non-goal shot more?  Right, it's obvious, the goal has to get more weight.  Once you accept that, the search is on. 

The question that I always have, the question that guides me, and really the basis of all my research is: to what DEGREE is something true.  I accept as a matter of fact that a clutch skill exists.  Why?  Because nothing is exactly truly random, when it comes to dealing with humans.?  Heck, even with machines.  But that is irrelevant.  What we care about is the degree to which something exists and the degree to which is can be measured as having an impact.  The clutch skill exists to the point where it can be an actionable item as a tie-breaker.  That's pretty much all it is.  If you have two hitters who are overall equal, but one is a LHH and the other is a RHH, the LHH can have the worst clutch skill and the RHH can be the king of clutch, but if a RHP is on the mound, it's the LHH that you send out.  (Presuming normal hand-split skills for all concerned.)

So, the search is on for goals and non-goal-shots.  And then within the subset of non-goal-shots, can we weight them differently?  As a case for further advancement, we'd want to know how far from the net the non-goal shots were.  Heck, even for the goals, we'd want to know that.

And the same applies for basketball.  You have the same kind of events with basketball that you have with hockey.  What correlates to future point differential, and to what degree do they correlate.  What you do NOT want to do is correlate the stats to CURRENT point differential.  That's because there's an inherent "x = x" kind of correlation to deal with.   This is (probably) why something like Wins Produced gets slammed.  In order to test a metric, you need to test it out of sample.  But, I don't know enough about this particular metric to say anything more.

So, for you soccer and football (and basketball and hockey) researchers out there: I'd like to see your research along the lines I just did for hockey.  What does predict future scoring differentials, and how much do you have to weight the various events?  (And focus on teams, rather than players.  Players will come next.)

Latest...

COMMENTS

Mar 08 15:03
Iterations of ABS (Automated Ball-Strike)

Feb 19 11:05
Bat-Tracking: Timing Early/Late

Feb 07 15:38
Aging Curve - Swing Speed

Feb 06 11:55
Batting Average as a proxy for fun!  Batting Average as a proxy for fun?

Feb 03 20:21
Valuation implication of straying from the .300 win% replacement level

Jan 31 13:35
Breaking into the Sports Industry WITHOUT learning to code

Jan 26 16:27
Statcast: Update to Catcher Framing

Jan 19 15:02
Young players don’t like the MLB pay scale, while veteran stars love it

Jan 14 23:32
Statcast Lab: Distance/Time Model to Catcher Throwing Out Runners

Jan 07 13:54
How can you measure pitch speed by counting frames?

Jan 02 17:43
Run Value with runners on base v bases empty

Dec 28 13:56
Run Values of Pitches: Final v Intermediate

Dec 27 13:56
Hall of Fame voting structure problem

Dec 23 19:24
What does Andre Pallante know about the platoon disadvantage that everyone else does not?

Dec 21 14:02
Run Values by Movement and Arm Angles

THREADS

December 01, 2014
Lessons from Bill James and MGL