The Importance and Misconceptions of Advance Hockey Analytics

Everything has already said before, but since nobody listens we have to keep going back and beginning all over again. – Andre Gide

It is common to read or hear the same issues being brought up repeatably whenever hockey analytics are discussed. Often these stem from misconceptions on the statistics and their theories. The point of this article is to address, educate and stimulate open conversations about many of these issues. It is not the first article of it's kind, nor will it be the last.

Indicative of What Happened versus Predictive of What Will Happen

One of the most common misconceptions when people argue over advance statistics is not understanding how every statistic differs in how informative they are. The most common example of this is goal differentials (the classic +/-) versus shot attempt differentials (Corsi and Fenwick).

Out-scoring is the point of the game, so it is understandable why some would look at plus/minus and want to evaluate players according to their numbers. The viewer is then weighting by how important the statistic is to winning that individual game. This however does not necessarily mean that it is an appropriate method in assessing the players responsibility for the end results.

The problem with this is goals are very rare events. For example, not one Jet has been on the ice yet for more than 15 goals at 5v5 score-close situations, but most have been on the ice for 60 to 200 Corsi events already. In statistics this problem is defined as sample bias. A very limited sample is not indicative of the population. If you were to go into the ocean and pull out four fish, 3 green and 1 red, you could not tell me from this that 3/4 of the ocean's fish are green from this sample.

The combination of these two points are the basic principles behind how a person can be psychologically manipulated from low-occurance high-profile events.

While goals are what matter in the game, shot differentials are more predictive of future goal differentials than a team's (or player's) goal differential at the time.

JLikens showed this here:

This is the power in shot metrics and why we care about them. The point of these analytics is to discover which statistics are more predictive of the team or player. The value is in discovering which team is more likely to win or which player gives you a greater chance in winning.

Some good source material on the subject:

* Arctic Ice Hockey: Retro NHL and Anger at Corsi – By Gabriel Desjardins

* Objective NHL: Predicting Future Success – By J. Likens

* Objective NHL: Shots, Fenwick and Corsi – By J. Likens

* Arctic Ice Hockey: Traditional Statistic Reliability – By Patrick D.

Causation versus Correlation

A common attempted detractor of shot metrics is the observation that no two shots are equal in strength. This is related to and rooted in similar principles the last argument.

No team is going to dominate the league just by "shooting whenever, wherever possible" (and probably would actually end up with a poor Corsi% anyways as you wouldn't be able to sustain much pressure). The reason that Corsi is used is because it is predictive of future goals and wins, not because it causes wins.

Daniel Wagner wisely states:

A forward might take a low-percentage shot from the outside, hoping for a rebound that will create a better scoring chance. A defenceman will sometimes throw the puck at the net while under pressure simply to keep the play alive. A fourth liner will shoot the puck as soon as he crosses the opponent’s blue line in order to create an offensive zone faceoff for his team’s top line. A top-six forward will optimistically shoot from a bad angle at the end of his shift after not being able to create a better scoring chance. The pursuit of shot quality will inevitably lead to shot quantity.

While Wagner discusses low-quality shots specifically, the bolded point –that shot quantity extends from shot quality — is the important part. You don't see teams with high shot quantity and low shot quality, because no team is going out there with the intention of throwing the puck away continuously.

Corsi and Fenwick, these are not the major causes for wins, but are the by-product of striving for what does create wins: scoring chances and puck possession. For this very reason they correlate to these things very, very strongly and therefore can be used as a proxy for the same thing.

It should also be noted that the reason for these correlations are due to the way the NHL is currently played, from its style to its parity. There is a possibility it will not always be this way, but it will take a large evolution of the game.

Some good source material on the subject:

* NHL Numbers: A Look at the Correlation Between Scoring Chances and Shot Totals – by Eric T.

* Boys on the Bus: The Statistical Relationship Between Hits, Shots, and Points – by Michael Parkatti

* Irreverent Oilers Fans: Zone Time – Vic Ferrari

Real versus Real Effect

Increased energy after a fight, hot-streaks/cold-slumps, faceoff wins, intangibles (like coachability and character), etc. These are all real things that exist. The point to hockey analytics is to discuss how relevant these things are, how large of an effect they have on the game, and how much energy should be concentrated on such strategies.

Some examples are:

-Energy does tend to increase after fights in a measurable way, but it is completely random which team receives the boost and not based on anything, especially not who wins the fight.

-Streaks extend from humans psyche trying to shape randomness into narrative patterns. Players don't tend to be more consistent or streaky than others, relative to what you would expect with natural variance.

-Faceoff wins promote possession and shots but its importance tends to be far less than the attention it receives in mainstream media.

The advantage to quantitive analysis is that the strength or importance of an effect is comparable, while with qualitative analysis the relative importance between two effects is mater of subjective opinion.

Some good source material on the subject:

* NHL Numbers: The Myth of the Hot Goalie: Consistent vs Inconsistent Goaltenders – Eric T.

* Arctic Ice Hockey: Impact of Winning an Offensive Zone Faceoff – Gabriel Desjardins

* Oilers Nation: Does the Momentum Boost from Fighting Help Teams Win Games – Jonathan Willis

Shooting Percentages Over a Season is not 100% Player Talent, Nor 0%

A major portion modern hockey analytics has stemmed from the discovering that shot quality and on-ice percentages (both on-ice shooting and save percentage) is not a repeatable skill or controllable by the player over the span of a season.

Now don't get me wrong, players do have some effect on the percentages. You will not be able to convince me that the order of this list (which comprises only players with a huge sample size) is entirely out of coincidence. The problem is that natural variance plays a much larger role when looking exclusively over the span of a season. Career percentages tend to vary from 6 to 11 percent due to talent and role, but over a season you see numbers as low as 0% and up to 25% or higher. So, a player may take one step back from league average skill, but variance takes two steps forward, etc.

Over a large enough sample, these tend to regress to the mean. Which brings us to our next point.

Some good source material on the subject:

* Broad Street Hockey: Shooting Percentage Regression – Eric T.

* Broad Street Hockey: Factoring Regression into Analysis – Eric T.

* Broad Street Hockey: Fooled by Randomness: How to Evaluate Defensemen – Eric T.

Statistical Regression to the Mean and Weighted Probabilities Does Not Imply Destiny, Even When It Does

Throughout history there have been teams riding on the success of highly inflated shooting and/or save percentages. Usually this is followed by mainstream media writing pieces of the team turning around and changing their culture. Then the blogosphere nerds start shouting out the words "unsustainable" and "regression", essentially calling a hoax on the team's success. Then one of two things happen: 1) the team collapses (see 2011-12 Minnesota Wild or Toronto Maple Leafs for recent examples) or 2) the team does not (see 2012-13 Leafs).

Individually, neither result proves their side correct. Statistical regression is not a call of destiny, saying a team has to make up for previous good or bad luck. Regression is just simple probability. Nor is luck something that completely dismisses true accomplishments or struggles. Luck in sports is essentially the less likely outcome occurring.

Let's say you have a random number generator that chooses one number from 1 to 100. If the first number comes up as 90, would you place money on the next number being under or over? The wise decision would be under, as it is more likely. This doesn't mean over is impossible, just unlikely. It could happen, and if it does that is luck.

Some good source material on the subject:

* Arctic Ice Hockey: PDO and Regression to the Mean – Gabriel Desjardins

* Arctic Ice Hockey: Luck vs Shot Quality in Shooting Percentage – Gabriel Desjardins

* Arctic Ice Hockey: Does Luck Exist and How Do You Factor For It – Garret Hohl

* NHL Numbers: Studying Luck & Other Factors in PDO – Patrick D.

Context, Context, Context

A player's Corsi is a good description of what occurred and likely to happen given similar situations; however, one must always be aware that how a player is deployed will effect the results.

Often used in criticism on statistics, context is something known; we study it and try to account for it.

The effects are very innate to understand. Playing with better players will make your results better; playing with weaker players will make your results worse (quality of teammate). Playing against better players will make your results worse; playing against weaker players will make your results better (quality of competition). Starting more of your shifts in the offensive zone will make your results better; starting more of your shifts in the defensive zone will make your results worse (offensive zone starts).

One of the major areas of research going on is how much player usage affects an individual and how to diminish these effects when comparing players.

Some good source material on the subject:

* Mc79Hockey: Corsi and Context – Tyler Dellow

* NHL Numbers: The Importance of Quality of Competition – Eric T.

* Arctic Ice Hockey: Does QualComp Matter – Dirk Hoag

* Arctic Ice Hockey: Further to: Does QualComp Matter – Gabriel Desjardins

Summary

Statistics are imperfect and no one actually thinks they are perfect, just like there are no people who just watch statistics but not the game. Quantifiable analysis however is very useful.

This doesn't cover everything but is a start in the right direction.

Feel free to discuss your thoughts or ask questions below.

Indicative of What Happened versus Predictive of What Will Happen

Causation versus Correlation

Real versus Real Effect

Shooting Percentages Over a Season is not 100% Player Talent, Nor 0%

Statistical Regression to the Mean and Weighted Probabilities Does Not Imply Destiny, Even When It Does

Context, Context, Context

Summary

Talking Points