Early this morning, the Winnipeg Free Press ran a large article by Ed Tait called The Art of Hockey Analytics. The piece is superb and highly recommended to all of our readers. It takes an unbiased look at what is going on in world of hockey analytics and represents all parties very well.
Winnipeg Jets general manager Kevin Cheveldayoff gives some very interesting quotes, most of which are completely accurate. Chevy mentions many of the known biases with number analysis in hockey, but does not mention that there are ways to account for these issues.
I'll break down some of the quotes as a learning exercise to show how one may factor for these biases.
But the game is watched by humans and the numbers are input by humans. And there’s lots of statistics that are arbitrary. Is that a face-off won or lost? Is that a hit? In some buildings it’s not a hit. The shots on net, what makes up a blocked shot? There’s lots of things left to interpretation that go into that stat package that means you have to have an asterisk beside it. - Kevin Cheveldayoff
What Cheveldayoff is talking about here is known as scorer's bias. Chevy comments on how there are inconsistencies to how information is recorded from building to building. No individual is perfect and everyone has some bias in interpreting what they see. Ironically, it could be argued that this bias is actually a larger issue for traditional methods of scouting.
Error exists in all measurements: filling a chemical flask, reading a measuring tape, cutting a wooden 2x4 or counting the number of giveaways a team allows in a game. Science usually accounts for this by taking multiple different readings, preferably from more than one individual. However in hockey, a team plays 50% of their games at home, which skews the weighting heavily to one perspective; this measuring bias is called home-scorer's bias.
Some statistics are more susceptible to home-scorer's bias than others; giveaways, takeaways, faceoffs, and hits tend to be the most prone to home scorer's bias. You determine this by comparing road versus home data. For example, the Jets were 9th worst in giveaways with 473 at home, but they were 18th with 262 in the exact same number of road games.
So, to diminish scorer's bias an analyst must take two steps:
1) Test for bias by comparing how often does home and road data differ in the league and if this discrepancy is large enough to alter interpretations
2) If yes to both, use road data only; if no, use all data.
By doing this there is no need to place an asterisk.
Line Matching, Teammates and Contextual Nuances
"What the numbers don’t tell you is who Mark Stuart is matched up against or playing with or is he with the first line or third line?" - Kevin Cheveldayoff
Depending on the context, Cheveldayoff could be completely wrong or right here. If he's talking about a number like Mark Stuart's team low for regular defensemen Corsi% of 47.6, then he would be correct in that the number is affected by Stuart's usage and the number doesn't tell you what that usage is.
However, that information is still available in statistical form and can be accounted for.
Data can actually be parsed to show how much time-on-ice (TOI) one player has seen against or with any other specific player, as well as display what events have occurred in those moments. For example, Stuart has been on the ice for just over 27 minutes against Mat Duchene this season. In those minutes the Jets have only controlled possession for about 47.9% of the time and have been out scored by a 2:1 ratio.
Context can also be specific as to show how a team's player has performed in an individual game:
Michael Parkatti here shows the Corsi +/- for Oiler players versus Jets. In the image you can see how Stuart performed well against the Oilers' top line this game, but struggled against their 2nd line.
Quality of Competition is the statistic that shows the strength in match-up a player faces over a time period. This can be either by a TOI weighted average of the opposition's Corsi% or by the average percentage of a team's TOI an opponent is given. Simply put, you can see who faces the better Corsi players and also who faces the big minute players more.
Context nuances expand to other factors beyond match-ups; TOI, linemates, and zone deployment are all known have an effect on a player's on-ice results.
Here is the Jets zone deployment and difficulty in match-ups for defensemen:
The higher up a player is on the graph, the more difficult their line matching was on average. The further left a player is on the chart, the more difficult their zone deployment was. The colour represents their Corsi% and the size of circle is their average TOI.
The chart shows how Mark Stuart had easier match-ups than Jacob Trouba, Zach Bogosian, Dustin Byfuglien, or Tobias Enstrom. It also shows how Mark Stuart had the toughest zone deployment of Jets regular defensemen. These factors play a role in the on-ice results.
How you factor for these is by comparing how Stuart performs against other NHL players with similar usage in all these factors, like this back in early March. This is where statistical analysis can shine. In traditional scouting you can teach Stuart what he is doing certain things wrong or right: his gap control, when to take his man, switch up with another defender, stop the cycle, etc. What traditional scouting methods have difficulty in showing is how effective Stuart is relative to the average player in similar situations.
Traditional Scouting versus Number Analysis
[S]o a guy like Mark Stuart... he’s not the most prolific puck mover or distributor, but Corsi and Fenwick don’t talk about the hits he has every night or the blocked shots. There’s value in all those statistical analyses, but there is also an arbitrary nature to it. It’s like plus-minus... plus-minus tells you a part of the story, but it doesn’t tell you the whole story. - Kevin Cheveldayoff
Whether or not Cheveldayoff is correct in this statement is again dependent on context.
Corsi and Fenwick are macro-level on-ice statistics. They are for determining whether or not a team is more likely to out-score or be out-scored when a player is on the ice. They do not tell how a player affects or improves their shot metrics. This is why hockey statistical analysts point out that these statistics are meant to be tools used in conjunction with traditional methods like scouting and video coaching.
When Stuart hits a player, the primary function is to change puck possession from the other team to the Jets. When Stuart blocks a shot, the primary function is to stop a shot on goal and therefore eliminate the chance that an attempt becomes a goal. These -and other- actions are all aimed towards helping the Jets outscore their opposition; Corsi and Fenwick show how effective a player is at this overall.
Ed Tait touches on this when he shows a quote from Phoenix Coyotes head coach Dave Tippett that has been displayed here at Arctic Ice Hockey multiple times:
We had a player that was supposed to be a great, shut-down defenceman. He was supposedly the be-all, end-all of defencemen. But when you did a 10-game analysis of him, you found out he was defending all the time because he can’t move the puck. Then we had another guy, who supposedly couldn’t defend a lick. Well, he was defending only 20 percent of the time because he’s making good plays out of our end. He may not be the strongest defender, but he’s only doing it 20 percent of the time. So the equation works out better the other way. I ended up trading the other defenceman. - Dave Tippett
In both situations there is a player who has some strengths and some weaknesses. Whether the puck-mover or the defensive-defenseman improves the team more-so overall is dependent on whether their strengths overcome their weaknesses.
Every player has strengths and weaknesses. What statistical analysis allows a team to do is determine whether or not they are more harm than good (or vice versa) and by how much.
What matters to Jets fans the most is the Jets. We've known for a while that some of the Jets do indeed look at numbers. Claude Noel has previously mentioned separating lines because of poor plus/minus. Noel also mentioned this season that Dustin Byfuglien's scoring chances against had improved this season. Maurice mentioned his decision in putting Michael Frolik on an elite shutdown line with Andrew Ladd and Bryan Little was encouraged from some "beyond the gamesheet" underlying numbers.
In the article Kevin Cheveldayoff gives further evidence that the Jets do use numbers in many of his quotes. Although it raises questions on which numbers do the Jets use and are they the right ones...
Personally, I'm not one that believes any number is necessary. I think there is more than one way to get the desired results. Numbers are just a tool in evaluating players, tactics and teams... if used appropriately.
Check the comments section below, as I'll be pulling some comments from the Winnipeg Free Press article and replying to them.