[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
THE BOOK cover
The Unwritten Book
is Finally Written!

Read Excerpts & Reviews
E-Book available
as Amazon Kindle or
at iTunes for $9.99.

Hardcopy available at Amazon
SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
Shop Amazon & Support This Blog
RECENT FORUM TOPICS
Jul 12 15:22 Marcels
Apr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref

Advanced

Tangotiger Blog

A blog about baseball, hockey, life, and whatever else there is.

Hockey

Hockey

Tuesday, October 24, 2023

NHL Edge - Player Tracking

The NHL released their summary reports based on player tracking.  You can read about it in a few places.  Here's one from The Athletic.  

As this is the first version, I temper my expectations to the reality that we are dealing with production data in a production environment as a first release.  Sports, unlike all other products and services, really has almost no full testing environment available.  Whatever preparations we may get from putting tracking at one park (Salt River Fields) for a few weeks really is a drop in the bucket when you compare to the 30 parks x 81 games that millions are watching live, that an MLB season offers.  Things that you might not even know to test for manifests itself almost immediately in a live game.

So, I look at the first release of anything sports-related as if it's batting practice, but that everyone treats as a live game.  While we may all think we should be like Tim Raines on May 2, 1987, that's not reality.

Now, let's talk about player speed and MPH.  First I should point out that I am in the minority here because I am dealing with Inertial Reasoning when it comes to speed.  Whether MPH or KPH, speed is being represented like this.  When you throw a ball and you are trying to outrace a motorcycle, sure, that seems reasonable.  But life would be easier had we presented "time to plate" as the goto number.  See, when you present things in MPH, there's nothing more you can do with that.  It's the end-of-the-line. If you want to USE that number, that speed in MPH, the very very first thing you have to do is convert it to feet per second (or meters or yards).  But the key point is that the denominator is seconds not hours.

Why is that?  Because then you can actually use that number.  Suppose for example I tell you a runner is rounding third at full steam, at 30 feet per second.  How long until he reaches home plate?  That's 90/30 or 3 seconds.  And suppose an outfielder is releasing a ball at that very instant, he is 250 feet away, and the average flight speed for his throw is 100 feet per second: how long will that ball take to reach the catcher?  That's 250/100 or 2.5 seconds.  That's the story.  The runner will get to home in 3 seconds, the ball will reach home plate area in 2.5 seconds.  The catcher has 0.5 seconds to do something, whether to stand there waiting for the runner on a perfect throw, or he needs to scramble to get home on an offline throw.

An outfielder misses catching a ball by 3 feet.  How much faster would be need to be to catch it?  Well, if he was running at full speed for 2 seconds and 57 feet (28.5 feet per second) then he'd need to bump that up to 60 feet in 2 seconds (30 feet per second).  I could go on, and have gone on.  The point is simply this: make the number usable, applicable to the task at hand.  And the task at hand is not to just "present a number".  It's to give that number relevance, to let it resonate for the play.

Now to hockey: they are showing MPH, which of course is the default position.  But suppose I tell you that Connor McDavid is skating at 30 feet per second toward the net, and in the meantime, Cale Makar is defending him by skating backwards at 20 feet per second.  If I just gave you nightmares from school about two trains colliding, this is exactly correct.  This nightmare for you is a dream for me.  I've been waiting for this data all my life.  If McDavid is going to skate for 2 seconds at this speed, he will cover 60 feet of ice.  Makar in the meantime will cover 40 feet of ice in the same 2 seconds.  In order for McDavid to not beat him, one on one, Makar has to have 20 feet of space between him and McDavid.  (All numbers for illustration purposes only.)

There's a reason that we don't report 100m runners and 200m runners in terms of MPH.  It's not relevant, and it won't resonate.  What they do instead is report split times, like from 70m to 80m, they run in 0.98 seconds or something.  This is something that matters, because it gives them a real target to their overall 100m run.  Shaving 0.02 seconds in that split means shaving 0.02 seconds on their overall number.

It's an eventuality that the presentation of player moving speed (running, skating) will be in a form of feet or yards or metres per second.  Ideally, we can set the standard from the outset, rather than needing to reset it after a long battle.

(2) Comments • 2023/10/24 • Baserunning Hockey

Friday, June 30, 2023

Bayesian Goalie

Many years back, I suggested that we add 3000 shots of league average performance in order to establish goalie talent.  We can of course do better, if we treat number of prior games played as a variable (rather than just assuming it is only independent trials).  But, let's set that aside for the time being.

Here we see research from @spazznolo that suggests we only need to add 900 shots.  

Small note on the horribly named Beta Distribution and it's equally inane alpha and beta parameters: it's better to use the ballast and the population mean as the two parameters.  In other words, alpha+beta is the ballast, and alpha / (alpha + beta) as the mean.  It makes more sense.

As for 900 or 3000 or whatever it is: it's really based on what is your universe of goalies.  You could legit have both being valid, but it's only applicable based on your selection criteria of goalies.  Hence my point that prior number of games played should be used as a talent indicator, rather than assume games played is completely devoid of meaning of talent.

Monday, March 27, 2023

NHL Draft, using the Gold Points

There is a rather clever standings model proposed by Adam Gold (@winunlimited).  If we apply a small variation: a team, starting at the all-star break, declares whether they will forego a run at the Stanley Cup, and instead make a ran for the top draft pick (Connor Bedard, let's say).

At the all-star break, courtesy of data provided by the kind and generous @domluszczyszyn, eleven teams had almost no shot at the Stanley Cup.  Those teams would all declare they are giving up on the Cup and are all-in on the Draft (aka, the Bedard sweepstakes).  Every game now counts toward the Draft.

The wonderful thing about the Gold points (besides the amazingly great name so we're lucky that Adam is named Gold) is that every win counts toward something.  Tanking is a thing of the past.

Two games after the All-star break, the Sabres would also likely declare themselves for the Draft.  The Sabres went on a decent run there for a while, winning 5 of 6, and so was able to get into the Gold Standings pretty well, even though they will end up having a couple of fewer games than the rest.  At some point in early to mid March, the Capitals would also have declared.

If things went as above, this is how the Gold Standings look:

Gold TEAM
30 Vancouver Canucks
29 Arizona Coyotes
24 Ottawa Senators
23 St. Louis Blues
21 Detroit Red Wings
20 Buffalo Sabres
20 Chicago Blackhawks
20 Montréal Canadiens
19 Anaheim Ducks
19 Columbus Blue Jackets
17 Philadelphia Flyers
12 San Jose Sharks
7 Washington Capitals

As you can see, the Canucks and the Coyotes would be fighting for Bedard (or whoever they want).  And every game becomes important.  And this is true whether you go for the Cup or not.

The Predators and Flames would be the next teams to try to figure out when they'd declare for the draft.

One note: while I said "all star game", we'd probably have to make it something like "after 50 games" or something.  At the all-star break, games played ranged from 48 to 54, so naturally, you couldn't use the date, since the number of future games would not be the same when everyone has the chance to declare.  Just a matter of selecting the game number.  You don't want it too early.  Probably 41 games (halfway point) is the fewest games before you declare for the Draft.

Monday, January 30, 2023

Science behind the genius of Lidstrom

An old research paper I just came across.  And here's the summary.

(1) Comments • 2023/01/30 • Hockey

Saturday, January 14, 2023

WOWY Hasek

I went through WOWY for a bunch of goalies about 18 months ago.  WOWY is With or Without You.  The idea is to compare how many goals a team allows with that goalie in net, and without that goalie in net.  There are of course two big things that conspire against us.  The first is that we are counting that the "without" goalie is of similar caliber for all our goalies, or at least, for the goalies we care about.  When you have Fuhr paired with Moog for an extended period of time, that's not going to work.  So, basically this process works for alot of goalies, but not every goalie.  The second is that Random Variation is going to rear its ugly head.  While it SEEMS we are isolating the goalie, there are other things going on.  

In any case, with those caveats, the results of that process still end up looking pretty good, at the career level.  Maybe there's a couple of goalies missing (like King Henry), and maybe a goalie or two is higher than they should be.  The beauty of this process is its simplicity, even if it's too simple.  

I should also point out this is regular-season only.  Patrick Roy's brilliance in the post-season, if we were to even include it, would be difficult since in the post-season there are no "Without" goalies.  To the extent that there are, it means the star goalie had a rough series.  In other words, in their great seasons, with no without-goalie, their season wouldn't count!  So, just be careful out there.

Anyway, for those following The Athletic 99-after-99-since-4 series, they will note that the top 2 goalies below in fact are in their top 20 overall.  Their Goalie rankings are in paren below.  They had King Henry 7th, Fuhr 9th.

  1. -466 Hasek (1 or 2)
  2. -307 Roy (1 or 2)
  3. -295 Esposito (5)
  4. -212 Luongo (8)
  5. -210 Dryden (4)
  6. -196 Bernie Parent (6)
  7. -173 Brodeur (3)
  8. -157 Vanbiesbrouck (unranked)
  9. -142 Price (13)

There are other goalies I didn't look at, like Chico Resch, Billy Smith, etc, and it's possible a few others may squeeze into the list.

Sunday, October 30, 2022

Connor McDavid, plus/minus, and Pitcher Won-Loss records

Background

The plus/minus stat, along with save percentage, was the NHL's (and WHA's) first foray into the enhanced statistics revolution that has taken hold of MLB. I believe it made its first public appearance in The Hockey News Yearbook in the 1970s. At least, that's where I discovered it contemporaneously. And if I can trace a moment in time where the concept of enhanced analytics planted its seed in my head, it would be then. That would be several years before I discovered Bill James and Pete Palmer.

Ron Andrews was the chief statistician for the NHL. Frank Polnaszek did terrific work for the WHA (he was also statistician for the New England Whalers). Ron, I presume, likely saw that all the Bruins of the Orr/Espo teams had pluses, no matter how bad a player, and all the expansion Capitals had negatives, no matter how good a player. Clearly there's a heavy bias to the plus/minus stat. Instead of the plus/minus TALLY (currently in place now), Ron instead presented it as a PERCENTAGE. The goals on the ice for that player, divided by the goals scored for the team, for both goals scored and goals allowed. Ron even went the extra mile and counted only the goals the team scored in the games the player played in.

What this did was ensure that the average player of each team would always have a plus/minus of 0%. This is true whether it was the powerhouse Bruins, or the Capitals. So, whereas the TALLY method went too far in one direction, the PERCENTAGE method went too far in the other direction. Of the two however, the PERCENTAGE method was the better way. Unfortunately, once the NHL made plus/minus an official stat (with sponsorship for awards), the simple-to-explain TALLY method is what became de rigueur. In addition, there's the issue of including shorthanded goals in the pluses for the PK team and minuses for the PP team. While not necessarily a poor choice, it was less than desirable.

In any case, what is official is not necessarily what is right. And given that we have access to all the data, we can create our own versions. And the lovely Natural Stat Trick website segments that data for us in an easy-to-use form. For the rest of this post, I'll ONLY refer to the 5-on-5 data presented at NSS.

Interlude 1: Random Variation...

Let me take another interlude and talk about Pitcher Won-Loss records. We know they are not to be referenced or trusted, but why? Let me describe a simple shortcut, a not-bad method, to try to understand the idea behind Random Variation: 1/sqrt(N). Take the number of decisions, say 25, take its square root, meaning 5, and then take its recipricol, meaning 1/5, or 0.2. And so, a true .600 pitcher will, with 25 decisions, see a winning % of +/- 0.2, or 0.200. That means a true .600 will see anything from .400 to .800, just by Random Variation, over 25 decisions.

That's why we can't trust it. At the same time, when it comes to a career, the signal is able to overcome most of the noise. If you have 400 decisions, that gives us 1/20 or 0.050. And so a true .600 pitcher will have a .550 to .650 win% by Random Variation for a 400 decision career. Still not great, but also not the absymal view of single season. Over a period of a few years, say 121 decisions, that's 1/11 or 0.091. So you can see that a true .600 pitcher over 121 decisions will almost always finish with an over .500 record. That's how long it takes for Random Variation to work.

So, the more games, the more Random Variation will pull your observations to its true center.

... and Systematic Bias

There is another concept to worry about: Systematic Bias. If you can think of Andy Pettitte playing most of his career with the powerhouse Yankees, his true win%, whatever it is, is constantly being pulled AWAY from its true center. The Yankees hitters are helping there. And the more the games, the more that bias is strengthened. Bias is a big problem because a pitcher's Won-Loss record is not a pure pitcher thing. First off, half the W-L is driven by the offense and half by the defense. The defense itself is driven mostly by the pitching, but also somewhat by the fielding. The pitching itself is driven largely by the starting pitcher, but also often enough by the relief pitchers. Once you start splitting up that pie, you end up with the starting pitcher contributing say about one-third to the won-loss record. The rest is bias that needs to be handled. If it's like Pettitte, it's a bias that is not handled by large volume, because the same variables are always in play. But if it's a pitcher that moves around quite a bit, the bias tends to cancel out, as the offensive support is more evenly distributed and the bullpen support as well, and so on.

In other words, whereas Random Variation depends on quantity for the signal to overcome the noise, Systematic Bias takes hold more strongly with the more quantity.

Pitcher Won-Loss Records

While there are ways to try to correct the issues with Pitcher Won-Loss Records, we don't bother for the most part. Why? Because what we'd need to do is use ERA and FIP and offensive run support and bullpen support. So, if we already have all that, then why bother correcting the W-L records? We would just use ERA and FIP directly.

But, if all you had was Jacob deGrom's W-L record, and the number of runs scored and allowed in Mets games started by deGrom (and you know NOTHING else), well, we make due with the W-L record. We don't use them unadjusted, but neither do we discard it. That's because the W-L record is driven by data that we don't have recorded at the individual pitcher level. As much as we'd mock the W-L record for its high amount of Random Variation and potential for Systematc Bias, it's also the only game in town. We'd adjust it, and apply a large margin for error. But we'd still use it.

Interlude 2: Playoff Series

Interlude to interlude: Here's another little secret about Random Variation and what I just said about the .600 win% pitcher. It also applies to teams and to playoff series. If you have a true 106 win team facing a true 90 win team, that's a 16 win difference, which over 162 games is about 0.100 win difference. And so we'd expect the 106 win team to have a .600 win% against a 90 win team. And, we just learned you need 121 games in order to "guarantee" that such a team will always finish above .500, meaning you need 121 games in a playoff series. So, talking about 5-game or 7-game or 9-game series, well, those really will move the needle little.

McDavid

Anyway, where was I? Right, plus/minus. Why did I talk about pitchers and won-loss records? Plus/minus operates on the same principal. Connor McDavid is always playing on the Oilers, and how much the team changes year to year will determine the level of bias. McDavid doesn't play with every Oilers player, it's more common for players to cluster with other players. So there are many occasions for bias. Since McDavid has been in the league, the Oilers have bounced around a bit. So it's not like the consistent excellence of the Yankees.

I also mentioned that 121 games or decisions is needed. This is of course a rough rule of thumb, just to give us some bearings to work with. In our case of plus/minus, we are talking about events, namely goal events. Each year, McDavid is on the ice in 5-on-5 play for a total of 120 to 150 goals. So, that's a good number to work with each year, at least for Random Variation, not bias. Over his career, he's been on the ice for 856 goals (scored and allowed) with is a very big number to work with. And since the team has bounced around a bit, the bias is likely not that strong. There's always going to be some bias of course, but, we can't let everything stop us. We deal with things as they come up, and we create error ranges for things we can't deal with.

The other thing with plus/minus is that our player is one of five players on the ice (or six if you count the goalie). Whereas a pitcher's W/L record is about 1/3rd influenced by that pitcher, a skater's plus/minus is 1/6th influenced by that skater. In other words, a skater's plus/minus tells us half as much as a pitcher's W/L record. This is certainly less than ideal. But one year of 25 pitcher decisions is less than 120 to 150 goals for a skater. Quantity helps here.

WOWY Goal Differential

Anyway, so what can we do here. Again, just to keep moving the discussion forward, I'm going to make some assumptions. These assumptions can be addressed in a later iteration. We just need to get our framework in place, then we can better refine it. Just for simplicity, I'm going to treat the 2015-16 to 2022-23 Oilers (thru the first 8 GP) as one team. These Oilers goals scored and goals allowed totals, 5-on-5, are: 1061-1124. When McDavid was on the ice, the Oilers scored 477 and allowed 379 goals. So, we have our first inkling that McDavid is a positive force on the ice.

What I like to do is the WOWY method, With or Without You. We look at the Oilers with McDavid on the ice and without him on the ice. Here's that breakdown:

477-379 With McDavid

584-745 w/o McDavid

That is one heckavu difference. Let's pro-rate the "without McDavid" line so we can make a more even comparison. There's a few ways you can do this, but I'll just do a simple way, and treat the "goals allowed" as the "cost of doing business". So, I'll pro-rate the goals scored as a ratio of 379 goals allowed. We have this:

477-379 With McDavid

297-379 w/o McDavid (pro-rated)

More specifically, the with-McDavid really means McDavid + 4 Oilers and the without-McDavid means 5 Oilers. So, the next step is to take 4/5ths of the without-McDavid line. We get this:

Read More

Thursday, April 28, 2022

NHL realignment: Insane Proposal

I like insane ideas. If you like insanity, then enjoy. If you don't, you really won't want to read the rest of this.

We start with the existing 4 regional divisions. Those stay in place at the start of each season. Each team plays each other 4 times, for 28 games.

Part 1 of the insanity: you have a pre-game shootout to be used as a literal tie-breaker. A game that ends in a draw after 60 minutes will have the tie-breaker go to the winner of the pre-game shootout. In effect, one team starts with 0.5 goals, and every goal is a potential lead changer.

The top 4 of each division go into the Premier Conference, the bottom 4 go into the Challengers Conference. (Name them however you wish.) We now have 16 teams in each Conference.

Part 2 of the insanity: the matchup results carryover. So if Leafs and Bruins had the Leafs win 3 and lose 1, and they both finished in the top 4 in the Eastern Regionals, then that's what the Leafs (3 wins) and Bruins (1 win) get in the Premier Conference. That means that each team has to play 4 more games against the 12 remaining teams in the Conference. That's 48 more games.

All teams have now played 76 games.

Part 3 of the insanity: The top 12 in the Premier move into the Champions League, as do the top 2 of the Challengers. We now have 14 of the 16 teams.

Part 4 of the insanity

Wild Card

#15 and #16 in Premier play each other

#3 and #4 in Challengers play each other

The winner of each plays each other to be winner of the Wild Card.

#13 and #14 in Premier play each other: winner advances to Champions, loser plays winner of the Wild Card. Winner of this last play-in game advances to Champions.

We now have 16 teams.

Part 5 of the insanity:

The top 8 in the Premier are seeded 1 through 8.

Seeding for the remaining 4 in Premier, 2 in Challengers and the 2 Wild Card will be based on 3 points per Premier Conference win, 2 points per Challengers Conference win.

(Note: We should model how we'd actually want the points system to work.)

There you have it, Perfection. Or Perfect Insanity, depending on your point of view.

(3) Comments • 2022/05/20 • Hockey Plus_Minus

Thursday, December 16, 2021

Did Gretzky face easier goalies to score on than Ovechkin?

​When rookie Patrick Roy was 20 years old, he won the Conn Smythe for playoff MVP.  ​When Patrick Roy was 35 years old, he won the Conn Smythe for playoff MVP.  He won his three Vezina's as best goalie in his 20s.  He also finished in second place when he was 36 years old.

Patrick Roy, like Mariano Rivera, was ageless, basically as good at the start of his career, as the middle of his career, as the end of his career.  Or, Roy actually improved with age to match each season's incoming class of new goalies.  The presumption is that almost all players will improve in their 20s and decline in their 30s.  But, maybe Roy was different, and he actually simply kept improving with age, and it's only all the new better goalies coming into the league that made it seem as if he was ageless like Mo.

In his first full season at age 29, Hasek led the league in save percentage.  He repeated that at age 30.  And 31.  And 32, 33, 34.  Even when he was 42 years old, he finished 5th in the league in Vezina.  Ok, so maybe Hasek was so much better than the league that all the new incoming good goalies couldn't hope to topple him until he became more attainable at age 37.  So maybe we can't learn from Hasek.  Except he did play until he was 43.  That's really really hard to do if you have a continuous flow of new good goalies coming into the league.  Maybe the new goalies aren't THAT much better than the exiting goalies?

Martin Brodeur won his 4 Vezina's in his 30s, and finishing as high as 3rd in the league at age 37.  Again, either Brodeur was ageless, or he kept improving with age to match the incoming classes of good goalies.

Andy Moog played throughout the 1980s and 1990s, as the scoring environment radically changed.  His performance essentially kept pace with the league changes, and his very modest aging.  

In other words, the evidence is overwhelming that goalies from Tony Esposito in the late 1960s to Marc-Andre Fleury today that goalies have not been improving.  The quality of goalies is essentially unchanged.  This is what allows goalies to have very long careers in any era.

If the new class of goalies was so good each season, then we'd have goalies retiring alot earlier, and having shorter seasons.  If the new class of goalies was so bad each season, then we'd have goalies extending their careers much longer.  None of this happens.  Goalies have careers of similar length era to era.  And they have success consistent with modest aging.

Wayne Gretzky faced as high quality goalies that Ovechkin has faced.

Now, you can certainly adjust for the scoring environment, which is something that is common in baseball (think Coors in the 1990s v Astrodome in the 1970s).  And hockey-reference does just that.  But it has nothing to do with the quality of goalies.

More importantly: Gretzky wasn't a goal scoring machine so much that he was a playmaker.  As a goal scorer, Mario Lemieux, Mike Bossy, and Brett Hull among his contemporaries were probably better.  It's his overall offensive game that allowed him to tower over his peers.  And his peers in the 1980s were just as good as best players of today.

(1) Comments • 2021/12/16 • Hockey

Sunday, August 22, 2021

WOWY Patrick Roy: Every season with and without Roy

​WOWY (With Or Without You) is a crude, but very effective, concept to describe the impact of players. In 1989-1990, Roy had a 2.53 GAA, with a league leading .912 save percentage, on his way to winning the Vezina (Top Goalie) and finishing fifth in the Hart (MVP).  His backup, Brian Heyward, had a tough season in comparison, but was actually just a bit below league average, with a 3.37 GAA and .878 save percentage.  (Yes, kids, save percentage at .880 was at one time considered average.)  Redlight Racicot gave up 3 goals on 6 shots as the third member of the goalie team.

So, in games without-Roy (which in this case is virtually all Heyward), the Habs gave up 3.45 GAA, which pro-rated to Roy's time on ice is 183 goals allowed.  Habs with Roy gave up 134 goals.  So, with-Roy is 49 fewer goals ( -49 ) than without-Roy.  We can repeat this process for every season of Roy with the Habs ( - 215 goals ) and with the Avs ( -92 ) to give us a WOWY of -307 goals for Roy.

There is one major assumption here, and a minor one.  The major is the "without" Roy goalies are "average".  Of course, their very presence as a backup of Roy likely presumes that they are bench-level at best, otherwise, they wouldn't be his backup.  Fortunately, this bias would exist for all top goalies (Hasek, Luongo, Brodeur, et al).  And we're also fortunate that top goalies have long careers so alot of the uncertainty of their backups will get reduced.  In a long career, top goalies will have a wide array and number of backups.

The minor is that the teams play "similarly" with Roy as without-Roy.  Ideally, we'd like to want this to be true.  But, it's not necessarily the case.  This is something that we'd have to prove to be true, or at least provide the uncertainty level to the extent that it's true.

One technical note: when I do WOWY, I actually use the harmonic mean.  So in the above case, the pro-rating is not to the 53 equivalent 60 minute games Roy played, but 37 games.  The end result is that Roy, with-Habs, ends up at -149 goals in 360 harmonic-games.  I then pro-rate that total to -215 goals in his 532 actual (60 minute equivalent) games.  For someone like Roy, it works out well enough.

Where it breaks is with the Cal Ripken seasons.  When Jacques Plante plays a full season (every single minute), there is no Habs-without-Plante.  So that entire season is discarded.  But even with other seasons of limited backup time, those seasons have their impact severely reduced.  The less Plante plays, the more we will end up counting those games.  Of course, the less he plays, the less good he is.  So, we get into a tough situation here.

To the extent that what I did works (to whatever degree you can accept), the top Habs goalies were 

  1. -294 Plante
  2. -215 Roy
  3. -210 Dryden
  4. -142 Price
  5. -43 Huet

If you were to create a Mount Rushmore of Habs goalies, those top 4 would invariably be it. At the bottom are Gerry McNeil and Bunny Larocque.  But when your without-McNeil is principally Plante and without-Bunny is principally Dryden, this points to the limitation we have, that you don't get the wide array you'd like to have.

There are corrections we can apply, by going through an iterative process, and looking at performances outside the seasons in question.  We'll get to that next time.

(11) Comments • 2021/11/29 • Statistical_Theory Hockey

Friday, May 28, 2021

Head-to-head or Common-opponent: what’s a better indicator in the NHL?

The NHL season gave us an unprecedented experiment on head-to-head and common-opponent theories.  What I come to call WOWY (with or without you).

Let's take the Habs/Leafs.  In head-to-head games in the regular season, the Leafs scored 34 goals to the 25 from the Habs.  So the Leafs scored 58% of the goals.  Against non-Hab opponents (which is every other Canadian team), the Leafs scored 55% of the goals. The Habs against those same common opponents, scored 50% of the goals.  In other words, against common opponents, the Leafs are a bit better than the Habs.  Against each other, the Leafs are far better than the Habs.  Which is closer to reality, their 10 head to head games (aka With), or their 46 common-opponent (aka Without) games?  In this case, the With is closer.

The Oilers scored 63% of the goals in their games against the Jets.  Against common-opponent, Oilers are 53% and Jets 55%.  As we know, the Jets demolished the Oilers, scoring 64% of the playoff goals. In this case, the Without is closer.

How about the other teams?  Let's remove the two matchups that are too close to call.  

  • Bruins/Caps are 51% in favor of Boston in the With, while the without is a slight favorite to Boston 56% to 55%.  Either way, we'd assume an even split. Boston scored 62% of the playoff goals, but as I said, we won't learn anything here.
  • Avs/Blues: Avs scored 55% of their head to head goals (and naturally Blues are 45%).  While against common-opponent, it was 60% Avs, 51% Blues.  In other words, whether you go with H2H or common-opponent, you get the same conclusion.  We won't learn anything here.  For sake of posterity, Avs scored 74% of the goals in the playoffs.

So let's get on to the last 4 matchups and see how they stack up:

  • Knight/Wild: head-to-head, they each scored 24 goals.  Against common-opponent, Knights scored 63% of the goals and Wild scored 54%.  So Knights should have been heavy favorites using Without and even-odds using With. So far, Knights have scored 56% of the goals.  So, the Without is closer.
  • Pens/Islanders: Pens scored 58% of the With goals, while the Isles are ahead in the Without: 57% to 55%.  In the playoffs, we know the Isles are way ahead, 57% of the goals.  So, with Without is again closer. Not only that, but the With is wildly deceiving.
  • Bolts/Panthers: Florida scored 56% of their head-to-head goals.  Against common-opponent, Bolts were a bit better at 58% to 55%.  In playoffs, Bolts were far better, scoring 59% of the goals.  The Without is much better.
  • Canes/Predators: Canes scored 59% of the head-to-head goals.  Against common-opponent, Canes were a bit better, 57% to 52%.  In the playoffs, Canes scored 58% of the goals.  In this case, the With is closer.

Adding it all up, in the six matchups that gave us a conclusion: 4 were better with the Without and 2 were better with the With.  Given that the number of games played With/Without were something like 8/48 for most teams, the volume of the Without certainly gave those games a leg up here.

The true question therefore is not an either/or. Rather, the question is if the head-to-head games should be given more weight than the common-opponent games.  That is, what weighting of the head-to-head games relative to common-opponent gives us the best predictor?

We'll look at that next time, once the second round is over, and we've got more data to work with.

Thursday, October 22, 2020

How close is Mookie Betts to being great enough to be in the Hall of Fame

Setting aside whatever rules are in place, I asked readers how they would want to vote for the Hall of Fame (if they had the vote) for Ken Griffey Junior at various stages of his career, as well as Ted Williams. And the consensus was that they’d vote for Junior after his 1997 season and Ted Williams after his 1947 season. This is how they stacked up:

  • After 1997, Junior had 41 wins above average (in equivalent of 7.5 162-game seasons)
  • After 1947, Williams had 42 wins above average (in equivalent 5.5 seasons)

So pretty clearly, they are looking for players to cross that 40 WAA level. That’s one of the things I do with my polls. I don’t ask: How many WAA are you looking for. Rather I ask an indirect question and reverse engineer how they are really thinking. So, 40 WAA is our threshold. That’s not to say you can’t make it in the HOF at 30-39 WAA, but that once you cross that 40, you are in.

Here’s Mookie Betts so far:

  • After 2020, Betts had 33 wins above average (in equivalent 5.2 seasons). He is one year, maybe two, from getting to the 40 WAA level, and be considered a Hall of Famer by those who follow me.

Since someone will bring up Mike Trout:

  • After 2016, Trout had 36 WAA in 5.0 seasons (aka slightly better than Mookie Betts)
  • After 2017, Trout had 41 WAA in 5.7 seasons (aka slightly worse than Ted Williams)
  • After 2018, Trout had 49 WAA in 6.6 seasons (aka noticeably better than Junior)

I also asked my followers about Bobby Orr. The consensus was after the 1971-72 season, the equivalent of 5 80-game seasons. So, that’s the Ted Williams level, of five years at the highest level of play. That’s what everyone is after. Bobby Orr notably at that point would have been only 24 years old! So he reached the Hall of Fame level at age 24, with five Norris (best defender), three Hart (MVP), and two Smythe (Stanley Cup MVP to go with the two Cups). He also somehow won the scoring title… as a defender.

Thursday, June 18, 2020

Floating Replacement Level

This discussion is easier to think of it for hockey: When Sidney Crosby goes down, his 22 minutes gets picked up by the other 11 forwards (1 minute each) and the 13th forward (11 minutes). So basically, Crosby gets replaced by 50% an average Penguins forward and 50% the bubble player.

On the other hand, when the 12th forward goes down, his 11 minutes gets picked up totally by the 13th forward. Same thing with the six defenders, or with the goalie.

When it comes with baseball, the concept of chaining would also apply, BUT NOT AS MUCH, as Patriot describes very well here (look for the section titled Chaining).  In hockey, players are much more fluid in terms of giving out playing time.  There's 120 minutes to give out to the defenders.  When one guy goes down, everyone below him steps up a bit, getting a couple more minutes, and the 7th player slides into the 6th slot.  With baseball, it's somewhere between goalie and defender: not as rigid as a goalie, but not as fluid as a defender.  You could slide someone up the batting lineup, but you wouldn't necessarily slide the regular 2B to SS.  It would be too unfamiliar.

And so, in the Crosby example, where you could argue it's basically half way between average and bubble, in baseball, it's going to be much closer to the bubble line, even for the top-end player.  And so we kinda take the lazy way out and apply 100% the bubble player.  But don't think that's RIGHT.  It's just EASY and close enough.  Be careful in applying the concept to other sports like hockey or basketball.

If you want a thought exercise: if your active roster was 40 players or 100 players in MLB, NHL, or NBA, would you take the LAST player as the bubble player?  No.  Then we can see how the easy way we applied on a 25-player roster is the WRONG way. It won't be close enough to right.  It'll be close enough to wrong.

So, you just have to be careful to understand WHY we made the choices we made, and see how it can apply to your circumstances.

Thursday, April 02, 2020

Do fans prefer small or large post-seasons?

​I asked that question of NHL, NBA, NFL, CFL, and Euro soccer fans. And to guage their interest in a tiny to wide open post-season, I offered stark choices: either 2 teams, or 75-80% of all the teams. No middle-ground. When the chips are down, are you a small-playoff or big-playoff fan?

I’ll start with NFL. I first asked to consider an 18-game season (which is more than the current 16, but it’s been talked about forever and it’s in-line with the CFL). By a 70/30 margin, those fans preferred a 2-team playoff (in other words, play right away for the Super Bowl) than a 24-team playoff. In other words, by going to 18 games, the fans did not have an appetite for an extended playoff season.

However, when I suggested an 8-game season, the tables were reversed: By a 60/40 margin, those fans preferred a 24-team playoff to a 2-team playoff. That is, when the regular season is too short, the fans would like an extended playoff season.

Logically though, 100% of fans should have preferred the 24-team playoff. After all, that would suggest another 4 or 5 rounds of playoffs, meaning that the bottom teams would play 8 games, while the rest of the teams would play 9 to 13 games, depending how far they go into the post-season. If you have an appetite for an 18-game regular season, why would you not want an 8-to-13 game regular+post season?

Anyway, so the midpoint is 12 games: if you have a 12-game regular season, fans are just as likely to prefer a 2-team post-season as a 24-team post season.

***

The NBA fans showed a similar split: 65% of fans prefer a 2-team post-season, after an 82-game regular season, while 57% of fans prefer a 24-team post-season after a 36-game regular season. The midpoint where fans are split down the middle we would infer as a 52-game regular season.

***

The NHL fans are much hungrier for the post-season, maybe lending to its history. At one point, they had 16 of 21 teams make the playoffs. As more teams have been added, the 16 became a mainstay.

So, 55% prefer a 2-team to a 24-team playoff with an 82-game schedule, while 63% prefer 24 to 2 with a 36 game schedule. Fans are split down the middle with an inferred 68-game regular season.

***

For Euro soccer fans, things are QUITE different. With a 38-game season (34 to 38 is the standard there), 73% prefer 2 teams. With an 18-game season, still 58% prefer 2 teams. Which logically makes no sense at all.

For example, suppose we construct a 38-game season such that the first 19 games is one game played against each of the other 19 teams in the league. Then, after that happens, the top 10 teams play one game against each other, while the bottom 10 teams play one game against each other. We’ve now constructed a 28-game regular season schedule.

And we add a provision that a win in the second half counts twice as much as a win in the first half. In other words, we get that playoff feel, but every team gets to play the same number of games. Wouldn’t THIS be preferred to stopping after 19 games, and simply awarding the championship to one of the top 2 teams?

***

The 9-team CFL fans were offered no playoffs at all, just award the Grey Cup to the top team after 18 games, or 8 of the 9 teams making the post-season. 59% preferred an auto Grey Cup. For a 12-game regular season, 54% preferred an 8 team post-season. The midpoint is a 14-game season.

Friday, March 06, 2020

Introducing Naive WAR for the NHL

This is the the simplest I can make WAR for the NHL. In other words, Naive WAR.

For this iteration, we start with the absolute core: goals, assists, time on ice, and saves. In a FUTURE iteration, we can add other facets, namely defense, scoring opportunities, and splitting EV, PP, and PK. But, that’s not the objective HERE. HERE, the objective is to lay the foundation, to convert the the basic stats into the WAR currency: Wins and Losses.

I’ll work through the Edmonton Oilers. The Oilers are 36-32, meaning .529 win% on 68 games. We are going to allocate 60% of the games to the forwards (40.8), 30% to the defensemen (20.4), and 10% to the goalies (6.8).

The forwards total 12,215 minutes played. Since they have 40.8 games, that means each 299 minutes converts to 1 game. Draisaitl has 1538 minutes, which we divided by 299, to give us 5.1 games. We do that for all the forwards. For defensemen, the conversion is 389 minutes per game. Darnell Nurse has 4.1 games.

For goalies, it’s 602 minutes per game. Mike Smith gets 3.5 games and Mikko Koskinen gets 3.3 games.

Ok, now we’ve established the game shares of each player. The total adds up to 68 games. Since the Oilers won .529 per game, we multiply that to establish the base for each player. Draisaitl base is 2.7 W and 2.4 L. What this base represents is what an average Oilers player would have, given that number of games.

Draisaitl is not average. So, we need to figure out how much above average he is. In this NAIVE WAR, we can only work with G and A. The Oilers have 215 goals and 366 assists. If we multiply the assists by 0.5874, we get 215. In other words, our metric will give half the value to goals and half the value to assists. Draisaitl has 43 goals and 39 adjusted assists for a total of 82 goal… something… 82 goal contributions? Whatever. It’s 82.

The average Oilers forward has 35 minutes on ice per goal contribution. Which means that Draisaitl is 38 goal contributions above average. The goal to win conversion is to divide by 6, so we have +6.4 Wins Above Average Oilers. (Again, using only G and A, and not adjusting for PP. We start somewhere and we start here. That’s why it’s NAIVE WAR.)

Since Draisaitl has a base of 2.7-2.4, we add 6.4 wins and subtract 6.4 losses. That gives Draisaitl 9.1 wins and NEGATIVE 3.9 losses. Because the “-” is already used for the “positive” losses, we will follow the lead from Bill James, flip the sign to “+” for “negative” losses. And so Draisaitl has a 9.1 + 3.9 record.

Here’s all the Oilers.

Read More

(5) Comments • 2020/03/12 • Hockey

Monday, October 21, 2019

Signal of Plus/Minus wrapped tightly with the bias of mates in a sea of Random Variation

?This is just a copy/paste of my Twitter thread. One good thing about Twitter is the ease of posting. One bad thing is that it's mostly lost in the ether. So often I've posted something on twitter that I wish I could find, that I had later wished I would have posted on my blog here. Anyway, so, I'm just going to copy/paste that right here.

=== start snip ===

When NHL statistician Ron Andrews released the plus minus figures in the 1970s, he did something clever with it: he showed the goals scored on ice as a rate of the team goals scored (in the games he played). And similarly with goals allowed.

So a poor team like the 77-78 Capitals would not have all their players as minuses. This is how it looked like. (click to embiggen)

So Guy Charron was on the ice for 75 non-PPG (105-30) which was 46.6% of Caps goals, and 100 goals allowed (or 40.3% of Caps GA). So, he's +6.3% relative to the average Capitals players. Obviously, the average Caps player is way below average. However, that +6.3% would not be that far off if you adjust for the Caps team. Probably +3% to +4%.

It's when in the 1980s they reported the number as a tally -25 in this case, that the unadjusted number was simply too far off where it would be if adjusted. In other words, you can look at actual Coors HR and be able to adjust the numbers and it would still look somewhat reasonable. With tallied +/-, it does not.

More importantly, representing the tallied numbers as a rate of the team numbers is something I do in all baseball and hockey, especially in high school and college. It gives you a decent snapshot, as well as a FIRST step in adjusting for context including site-specific scoring issues. The second and third steps are a bit more cumbersome, but as a first step, it fits the bill.

Also note that Charron here was on the ice for 175 goals. And if you are looking today at shots-based plus/minus, you'd probably need 1000+ shots on ice to get that level of reliability.

We know the issues, and we can have a separate discussion on that. The point I am making is that there's a signal there, if you know how to handle the noise.

=== end snip ===

Monday, July 29, 2019

We’re all Statheads; we just choose our own stats

"We're all Statheads; we just choose our own stats"

-- Cory Schwartz

I was reminded of what Cory likes to say when I made a flippant remark, in a not-so-subtle guise of a poll, to the point that hockey's plus/minus is better than totally useless.  I figured being better than totally useless was an easy bar for a baby to crawl over.  Unfortunately, a healthy one-third disagreed, and those who disagreed were more vocal about it.  Some like the usually loquacious MannyElk went with the direct "delete this".  CJ tried to be more nuanced. And everyone and his brother at EvolvingWild just disagreed.

Score Differential

In baseball, it's natural for us to look at team run differential, and make that the core to our metrics.  Indeed, that's the core to WAR found at Baseball Reference, extended down at the player level.

In hockey, you can follow a similar idea, look at goal differential, and make that the core to our metrics.  (And similar for basketball, football, soccer.  And while I know nothing about cricket, I'll say: cricket too.)  And naturally, if you are looking at it at the team level, you'd want the sum-of-the-parts to equal the whole.

Now, it's NOT NECESSARY that you apply the sum of parts theory.  After all, to do that, you'll have to decide how to handle Random Variation.  There's only so much you can identify at the player level, and so, there's going to be a gap in our knowledge.  Bill James for example in Win Shares simply plows through it, and insists on it.  I on the other hand am content to say "I dunno", and create a timing bucket.  Regardless though, the key point is that all the runs are accounted for.  They may not be accounted for at the PLAYER level (which Bill would insist upon), but at least I can account for the existence of all of them.

And the same would work in hockey.  If a team scores 4 goals and allows 2 goals, we should account for 4 goals scored and 2 allowed. And while we'd like to have all six goals assigned to players, I am content to say "I dunno", and create some sort of unknown bucket.  This could be timing, or random variation, or simply data that is too hard to assign to players.

If you do not do this, if you don't account for all the goals, then you are simply telling the reader: "trust me, I know what I'm doing, and it doesn't add up, because I don't need it to add up".  You can of course do this.  But you are creating an unncessary hurdle.  Rather, it's simpler to just acknowledge the gap.  In the above case, you may say a team scored 4 and allowed 2, but your process say the team is going to be assigned 2.7 goals scored and 3.2 goals allowed, because, "the process".  That's not the best way to sell something.

Extending Differentials

Now, goal differential is a core metric at the team level.  And extending it at the player level is also a core metric.  Hockey complicates things because of the man-advantage scenarios. And that players don't play with everyone.  And the number of goals is low to begin with.  Which is why we talk about adjustments.  This is common in baseball, where we can adjust our core metric, like say wOBA or ERA, based on the scoring environment or other influences.

Hockey's plus/minus is already in the currency we want for a core stat: goal differential.  However, there are other plus/minus stats you can do.  You can do it for all shots, which of course includes goals.  So as to not mix anything, I'll call goal differential as NetGoals.  And so we also have NetShots.  NetShots is of course at its core, NetGoals plus NetNongoalShots.  In other words, if you are going to praise NetShots and deride NetGoals, what you are saying is that NetNongoalShots is pivotal.  That including NetNongoalShots is what makes or break NetShots.  That relying on NetGoals is totally useless, even with adjustments.  And that is an untenable position.

Merging and Unraveling

You can also try to argue that since we have both NetGoals and NetNongoalShots, that we therefore no longer need to focus on them as components, that we can simply look at NetShots.  Or some sort of weighting of the two, but still, amalgamized into one metric.  This is like arguing that if you have wOBA, you don't need OBP.  Or you don't need K/PA.  Au contraire, the components are the key.  And that's because the weightings of the various components are not a given.  They are often necessary to keep them separate, because the weightings are dependent on the number of trials.

RBIs are totally useless if you already have wOBA and RE24.  That's because you can get to RBI through those two metrics.  But if you don't have RE24, then RBIs (and Runs Scored) do have some non-useless value.  They are not totally useless.  The timing of events is important.  And distinguishing between goals and non-goal-shots is important.  And how you distinguish between goals and non-goal-shots is not a constant.

The key thing that I follow in my metrics is "how".  How did this happen, why did this happen, how do we explain this happened.  I don't roll my stats up into one number to let it sit there and... sit there.  The metric has to be able to be unraveled back to its components.  And I have to be able to explain it all in english (or french if I'm feeling confident). That's how I construct my metrics.  You don't have to do it this way of course. The world is a big place.  

Wednesday, April 10, 2019

How many years are in a sports generation?

?So, I asked my twitter followers about which hockey players they would consider "generational".  We can limit our window to when Bobby Orr started, to the year before Crosby/Ovechkin started.  That's forty years.  We have the three easy ones: Orr, Gretzky, Lemieux.  How many more can we add?

Patrick Roy was another easy one, as was Hasek.

Bourque and Lidstrom too.

One or two of Bossy, Lafleur, Dionne (though I think we're splitting hairs).

Of the big 9 forwards of the 1990s, maybe 3 of them.

There was an appetite for at least one Russian from the 1970s-80s.

The line started to get drawn with Robinson, Potvin, Howe, Coffey, McInnis just below it.

So add it up, and we're talking about 10 to 15 "generational" players over a 40 year time period.  In other words, a generational player happens every 3 or 4 years, on average.

Now, obviously, you don't like the answer.  So, you are going to change what you think of "generational" so that you get an answer that is more like 5 to 10 years.  And that means, over a 40 year time period, you'll need to limit yourself to 4 to 8 players.  So go ahead, and do that.

Monday, January 28, 2019

How do we know how many WAR to give out to nonpitchers and pitchers?  Or goalies?  Or QB?  Or?

?If you notice on Fangraphs, they hand out 57% of the WAR to nonpitchers and 43% to pitchers. This is actually the split that I determined some 15 years ago. Baseball Reference hands out at around 59/41, presumably based on a similar technique that Straight Arrow reader Rally Monkey came up with. I don't know how much Win Shares gives out, but I think it's around 64/36?

How did I come up with 57/43? We have to know the spread in TRUE TALENT. The problem is we don't actually know the spread, so we need to infer it. And we infer it based on observing what has actually happened, and removing the Random Variation that pollutes all observations. And when you go down that road, we end up with a standard deviation of a talent distribution that is roughly a ratio of 4:3 for nonpitchers and pitchers.

If you tried to do this for the NHL, the spread is going to be roughly 60/30/10 for forwards, defensemen, goalies.

I've never done it for the other sports. However, what you will typically find (not always, and not so strict) is that player salary is a decent approximation for the split. Again: not always; not so strict. But it's a decent guidepost. And where it deviates, then you will find a market inefficiency.

Sunday, January 27, 2019

How much goalie talent is there in the NWHL?

The largest breakthrough in sabermetrics in the last 30 years was courtesy of Voros, and the DIPS theory. By breaking up the performance of players into components, we can focus on batting average on balls in play (BABIP) and see that the spread in results (observations) is really not that much different than we'd expect from Random Variation. In other words, most of what we see is noise.

I applied this to NHL goalies several years back. Now with 4 years of NWHL seasons, we can apply it here too. This is going to be mostly math. For those who are averse to the math, you can skip to the next section.

Math Enthusiasts Enter

I grabbed all the goalie stats for the 4 seasons here.

I limited it to goalies with at least 120 minutes played, which is about 97% of all minutes played. So the question will really answer the spread in talent after that selection bias. Which really isn't that much of a bias, but let's keep going.

For each season, we figure the league average SV% of these remaining goalies, and compare it to each goalie, and convert it to a Z-score (i.e.,number of standard deviations the observation is from the population mean).

The range is +1.8 standard deviations for Lauren Slebodnick in 2016-17 to -2.3 for Katie Fitzgerald this season. We'll get back to Fitz in a bit.

The standard deviation of these z-scores is 1.07, showing that there is a certain amount of talent among these goalies. How much? Here's the fun math:

zScore is nothing more than the observation (relative to the population mean) divided by the random variation (of that population). In other words: 

zScore = (obs/rand) 

Regression toward the mean (RTTM) is (rand/obs)^2, which you notice is simply 1/zScore^2. 

Therefore RTTM = 1/1.07^2 = 87.5%

In other words we regress the observation 87.5% toward the population mean. HOWEVER, this is ENTIRELY dependent on the number of observations. Since the average number of observations was 257 shots per goalie per season, in order to have an RTTM of .50 given that we have RTTM of .875, we'd do: .875/.125*257 = 1800. And that number, 1800, is the regression amount, the number of shots we'd add to each goalie in order to infer their true talent. 

Math Enthusiasts Exit

Since all goalies face fewer than 500 shots per season, MOST of what you observe is in fact noise. While it gives the appearance that goaltending is mostly subject to Random Variation, this is purely because of the lack of opportunities.

As a result, for any one season, while the OBSERVED spread is +/- 10 goals saved, the talent spread we can INFER is a range of +/- 2 goals saved.

Katie Fitzgerald for example this season may be at -2.3 zScore, with -9.3 OBSERVED goals saved (or extra-allowed in her case), the inferred talent is only -1.0 goals. And in her two preceding seasons, she was +0.7 goals in each season. Sometimes Random Variation makes you look a bit better than you are, and sometimes it makes you look a little worse than truth.

At the career level, the goalie who has shown the most talent we can infer is Brittany Ott, with observed +13 goals saved, and a true +5 goals saved.

Note:

This method is limited to the data at hand. And there are other sources of true variation, namely the spread in the talent of the rest of the team's defense. You might be able to reasonably argue that the spread we've seen is actually consistent with that spread, and so, all the goalies in the league are in fact equals.

Just consider this the first step of many...

Friday, January 18, 2019

Introduction to WAR, part 3 of n

The series concludes, and now I will provide constructive criticism.  (Parts 1 and 2 are here.)

As I noted, I got a sneak peek, and I sent the twins my comments.  The first thing to jump out as being problematic is this:

If it didn't jump out at you at first reading, then it would once you look at player results.  The way I break out WAR is about 60% forwards, 30% defenseman, 10% goalies.  You can argue goalies should be anywhere from 8% to 12%.  If you look at how much goalies get paid, they get close to 10% of the payroll.  There's many ways you can try to look at it, but everytime I do, I end up around 10%.  So, anything that is not 10% is highly suspect, and almost certainly wrong.  I went after Hockey Reference on this very issue a decade ago because they were way overboard on the Goalie Win Shares.  

The above chart suggests goalies are a bit above 130 WAR for a bit under 600 WAR, or over 22%, double what I say.  This by itself is enough to doom the system.  But fear not, because the guys are very engaging and this is their opening salvo.  I'm sure once the community becomes more involved, the ~10% will become standard.

The other one is the split between F and D.  I use 2:1 split, and you can maybe argue for 3:2 split, but really it'll be somewhere between the two. Their split is about 2.2 to 1.  Not egregious and if not for the goalie split, not really worth bringing up.  But if we're going to fix the goalie thing, then we should talk about the F/D split.

***

The other thing is the goals per win conversion.  I've shown empirically and theoretically what it should be.  Their goals per win is simply too low.  It's all fine and well to create a theoretical model.  But if the model doesn't match reality, it's not really a model.  In the end, you have to represent reality to some degree, and this shorthand gets us there:

goals/win = goals/60min + 2.75

The NHL, with overtime, and 3-on-3, and shootouts, and "playing for the tie in regulation" provides us with all kinds of things to think about.  This is why the theoretical is always going to be limited if the model doesn't account for everything.  Their model is close to around +2.50 instead of the +2.75 I am using.  

This again should be an easy one for the hockey community to figure out.

***

The rest of what I will say is really not a disagreement, so much as an explanation. I'll put that in the comments as the mood strikes me.

It's terrific stuff overall, and that I've said as little as I have given they have written so much is really testament to how seriously and thorough they were.  This provides a terrific benchmark and reference point for discussion.

(1) Comments • 2019/03/04 • Hockey
Page 1 of 19 pages  1 2 3 >  Last ›

Latest...

COMMENTS

Nov 23 14:15
Layered wOBAcon

Nov 22 22:15
Cy Young Predictor 2024

Oct 28 17:25
Layered Hit Probability breakdown

Oct 15 13:42
Binomial fun: Best-of-3-all-home is equivalent to traditional Best-of-X where X is

Oct 14 14:31
NaiveWAR and VictoryShares

Oct 02 21:23
Component Run Values: TTO and BIP

Oct 02 11:06
FRV v DRS

Sep 28 22:34
Runs Above Average

Sep 16 16:46
Skenes v Webb: Illustrating Replacement Level in WAR

Sep 16 16:43
Sacrifice Steal Attempt

Sep 09 14:47
Can Wheeler win the Cy Young in 2024?

Sep 08 13:39
Small choices, big implications, in WAR

Sep 07 09:00
Why does Baseball Reference love Erick Fedde?

Sep 03 19:42
Re-Leveraging Aaron Judge

Aug 24 14:10
Science of baseball in 1957

THREADS

October 24, 2023
NHL Edge - Player Tracking

June 30, 2023
Bayesian Goalie

March 27, 2023
NHL Draft, using the Gold Points

January 30, 2023
Science behind the genius of Lidstrom

January 14, 2023
WOWY Hasek

October 30, 2022
Connor McDavid, plus/minus, and Pitcher Won-Loss records

April 28, 2022
NHL realignment: Insane Proposal

December 16, 2021
Did Gretzky face easier goalies to score on than Ovechkin?

August 22, 2021
WOWY Patrick Roy: Every season with and without Roy

May 28, 2021
Head-to-head or Common-opponent: what’s a better indicator in the NHL?

October 22, 2020
How close is Mookie Betts to being great enough to be in the Hall of Fame

June 18, 2020
Floating Replacement Level

April 02, 2020
Do fans prefer small or large post-seasons?

March 06, 2020
Introducing Naive WAR for the NHL

October 21, 2019
Signal of Plus/Minus wrapped tightly with the bias of mates in a sea of Random Variation