Retro NHL and Anger at Corsi

I was going to write something funny about turkeys today, but I couldn't think of anything. Instead, I'm going to talk about the irrational anger directed at shot differential. Even though NHL insiders as old-school as Harry Sinden used it as a metric to evaluate their teams, the current spate of rejectionists seem to think that shot differential is some bastardization of the original intent of the framers of hockey's constitution.

Let’s walk back a few months to Derek Zona’s post about the demise of the Colorado Avalanche in last year’s playoffs. After a 69-point season in 2008-09, nobody thought Colorado had a chance to make the playoffs last year. But Colorado took a brilliant flyer on Craig Anderson, who’d put up great numbers at every stop during his career, and it was about to pay off. When the Avs opened the 2009-10 season 9-1-3, they were almost guaranteed to outperform expectations and make the playoffs, regardless of their underlying possession, scoring chances and shot differential numbers.

And Colorado's underlying numbers last season were horrible. But they won a lot of games. If you were a fan who didn't care about statistics, would you really care what they said about your team? Wouldn't you just enjoy watching your team win? We know the San Francisco Giants got about as lucky as possible this past season, but the Bay Area's fairweather fans were able to enjoy the victory just fine whether their team performed over its head or not. Apparently that's not how things work in Colorado:

"I just don’t quite get the glee from all corners for pointing this out. Shouldn’t the story be how the Avs are still absolutely blasting preseason expectations?"

And here I thought I was spending a tremendous amount of time discussing the Avs exceeding performance expectations. Should we not seek deeper explanations? Must we simply assume that Colorado is a truly great team? Our critic goes on, addressing his complaints to "Gabe and the Corsiatti":

"As it relates to this discussion, about the ‘10 Avs being lucky, I’ll summarize your logic I took issue with thusly:

Team Corsi ratings correlate to team record to some degree.
The ’10 Avalanche team Corsi ratings were poor.
The ’10 Avalanche had a much better record than Corsi predicted.
Therefore, the ’10 Avalanche were lucky.

All of the premises are true. The issue with that conclusion is that it assumes that the only reason the prediction of Corsi could be wrong is luck. It’s a classic false dichotomy. I pointed out that such a conclusion requires the link between Corsi and record to not just be a correlation but a causal one (which you confirmed you do believe, despite a lack of any support for this claim). But since the only link between records and Corsi is some level of correlation, that’s by definition an invalid conclusion, unless you’ve got a lot more data and reasoning somewhere you’ve not deigned to provide. Based on what I’ve seen, all you can say is that Corsi’s prediction of the ’10 Avs was wrong.

You also keep claiming the burden of proof is on me somehow to prove the ‘10 Avs weren’t lucky. I don’t know that it wasn’t luck, much the same as you don’t know it was. It very well could have been luck. I have pointed out that your model is limited, and that perhaps considering other factors might improve it."

(That's the most coherent and least personally-insulting part of what the Corsi critic had to say, and I've left out the parts where he says "I feel like beating people over the head with correlation vs. causation lectures.)

Anyways, I think I see the issue here. This is not a "classic false dichotomy" wherein only two alternatives are considered, irrespective of other options (though if it was, I could see why it might bother some people.) The thinking behind Corsi is much more nuanced than that and despite what its detractors claim, it's not some kind of religion. Here's the basic idea:

First of all, if you've never visited Objective NHL, you're really missing out. Almost everything that goes into team-level predictive analysis in this so-called "corsi" model is derived there in gory detail. If anybody claims that something is unproven or that there are no numbers behind it, they can go there and almost certainly be overwhelmed by the numbers behind every assumption.

At any rate, we want to find what talent factors drive a team's winning percentage. Goals for and goals against drive outcomes, and they are very highly driven by shooting percentage, which itself is not a sustainable talent. There are plenty of ways to figure this out – comparing first-half team shooting percentage to second-half; or even games to odd ones; or even shots to odd shots. None of them show a persistent relationship.

What does show a persistent relationship is shot differential, in particular shot differential with the score tied at even-strength. The best predictive performance comes from goals, saves and missed shots taken together – blocked shots are driven to a great extent by a team's ability to block shots, so they're not as useful. This is what's commonly-referred to as "Fenwick", while Corsi (usually) includes blocked shots.

While Fenwick is the single-best predictor we have of team performance, it's not the only one. For one thing, team records over just 82 games are a poor sample of a team's ability, and much of a team's record is actually just luck. Don't believe that? Commenter Dan Lortimer has a great suggestion for you: go to NHL.com and watch the replays for a few hundred goals – count how many came off lucky bounces, and how many were skill plays. When you think about it that way, I think you know that luck is going to play a big role.

Together, Fenwick/Corsi and Luck account for around 3/4 of team winning percentage. What's the remainder? Goaltending talent – which Tom Awad estimates at about 5% – and special teams, along with a very small sliver that's due to shooting talent and the oft-mentioned "shot quality." So I don't think there's a false dichotomy here – there are five factors in this model, all of which are given credence in proportion to their predictive power.

***

So let's move on to this question of correlation vs causation. Yes, if you're sitting in your freshman philosophy class, you might argue that no such thing as "causation" can possibly exist because it requires that one particular thing causes another thing to be true. But in the real world, people are constantly running experiments to determine the extent to which one factor causes another. Does smoking cause lung cancer? Not 100% of the time, and not on a predictable timeline. Is smoking merely correlated with lung cancer, but not causative? You know, even after all these many years, we can't be certain in a philosophical sense that there's a causal link between the two. But does anybody honestly doubt that smoking causes lung cancer?

Now Corsi is a proxy for something else – puck possession, territorial control, scoring chance differential, all of which are almost identical metrics – and does anyone really doubt that having the puck more than the other team does not lead to winning? Yes, there are exceptions – faced with a "hot" opposing goalie or shooter, it's possible to lose a game, but "hot" streaks are transient – not skill-driven – and it would be incredibly bad luck (not skill) to run into them for an entire season. Goaltending and shooting talent have been studied by many different people in many different ways and have been found to drive only a small slice of the results – 1 or 2 wins per season.

The caution that "correlation does not imply causation" is a check on your assumptions, particularly the assumption that you've constructed equal groups in your experiment. That's why people compare individual teams by halves of the season or even and odd games or shots. At any rate, I haven't seen anyone dispute the experiment design, only the conclusions drawn from it. Well, every day people draw conclusions in just that way that Corsi was concluded to be the biggest talent-based driver of winning percentage. You can sit around paralyzed, debating the notion of absolute certainty, or you can say "You know, if the other team never gets a shot on goal, you'll win. Perhaps outshooting your opponents causes you to win more often that not. The data indicates that this is true, so a team talent model should include shot differential."

This shouldn't be a controversial conclusion. Is there a hockey coach in the NHL who disputes it? Is there a coach who revels in getting outchanced? Is there a coach who accepts that his forwards get penned in in their own zone and has a strategy that revolves around his goaltender bailing him out?

I know it’s easy to *attack me* but if you’re so convinced that we can never know anything, then you should be prepared to attack NHL-wide coaching strategies, and you should have some evidence to back up your claims. It’s a total cop-out to say “I don’t know but because your model’s not perfect, you also don’t know, so you should stop making conclusions and just shut up.” We know full well that sheer luck can result in bad teams posting good records and vice-versa – without it having any impact on their true talent. If you really dispute the notions of talent and predictive modeling, then you should be out there fighting against more than just my analysis of the Colorado Avalanche.

Talking Points