Percentages and Probabilities: Does luck exist and how do you factor for it?

There are a few related topics that have been tossed around involving statistics and their predictive abilities, predominately in the area of percentages and their regression to the mean. For some time now, the common sports fan has resisted these factors and their roles within the game. Fans tend to dislike the notion of luck being involved in accomplishments, especially when it diminishes their favorite team or player (sorry Toronto Maple Leafs and Nazem Kadri).

So let's discuss how luck exists in hockey and try to right its misconceptions.

Welcome to Hockey Statistics 101.

Does "luck" really exist in hockey?

Luck is involved in all things where a person does not hold full control over the outcome. For example, a player who wants to shoot the puck when challenged will have his success depend not only on his own skills and decisions but that of the defensemen and goaltender as well (plus a multitude of other variables). The player cannot create and force his own destiny, but what he can do is make choices that will improve his chances of success.

Now, if the player attempts something that has a high probability to succeed yet fails, he is "unlucky". On the other hand, if the task has a low success rate and the result is successful, he is "lucky". Luck in sports is essentially when the least likely result comes to fruition and unless said player has 100% certainty of the results, there is luck involved.

In hockey there is no certainty in the results of actions; everything involves weighted probabilities. A player cannot create a destiny nor can he will a particular outcome, but he can raise and lower the odds with his skill and decision-making.

This is a bell curve (or normal curve or Gaussian curve):

It is commonly used in statistics to display all the possible outcomes given an extremely large sample and the probability of that outcome. The higher the curve is at a particular point, the more likely the outcome. In this case, the mean value (or average) is at the centre and is the most likely outcome, while the outcomes become less likely as you move towards the extreme in either direction. To simplify things, we will not discuss skewed or non-normal distributions.

Should you keep "luck" in mind with hockey statistics?

Statistics in hockey are predominately used for two things: they can tell you the results of what occurred — like 'Player A' scoring two goals or 'Player B' being on the ice for one goal against — or they are used in creating a value system in comparing players and teams with each other or themselves.

The problem with using a statistic to impose a value on a player is that the strength of the value system relies on how reliant what occurred on the ice was on said player's abilities and decision making. If you are using an event where a low probability outcome occurs, then you are improperly evaluating the player with the statistic.

This brings us back to the bell curve shown earlier:

Over a period of time, a player's "true value" due to skill, work ethic and decision-making would be the value noted by the black arrow. It is the most likely value; however, it is not the only possible outcome. The statistic being discussed (whether that be goals, assists, +/-, etc) could fall higher or lower due to the variables outside of a player's control, as discussed earlier. The red and blue arrows represent two possible outcomes, one lower and one higher. They are less likely then the "true value", but still entirely possible.

How does one factor for "luck" with hockey statistics?

If you are playing a game with someone and they pull off a trick shot on their first attempt and you thought it was due to luck, you would likely counter with a "bet you can't do it again" dare. Repeatability is what dictates the difference between skill and luck. Statistically this is akin to increasing the sample size to determine how high the probabilities of success are. The larger a sample size is, the less likely consistent success (or failure) is due to luck or chance.

Again, looking at the probability curve it would look like this:

The blue curve and red curve both display the probabilities of all outcomes given that the "true value" is in the centre; the difference being sample size. The red curve shows the probabilities given a larger sample size. The "true value" and values closest to it are far more likely than the blue curve, while the extreme values are less likely.

This is one of the multiple reasons why many prefer high-occurance statistics like shot attempt differentials (such as Corsi and Fenwick) over low-occurance statistics like the traditional +/-. While goals are indeed more important to the game than a shot attempt, evaluating a player on the statistic may not be very telling of the player's "true value" to the team. Different statistics occur at different rates and some statistics become more reliable with less games played.

Determining the repeatability and regression to the mean

Over at Broad Street Hockey, Eric Tulsky has already shown in three excellent articles that a player is unable to control a team's Sh% and Sv% when they are on the ice (as shown here, here and here). This means that, although they may be affected by skill, over the context of a NHL season variables outside of a player's control (ie: "luck") play a larger factor in the difference between being above or below league averages. The volatility of these percentages is due to the extremely low-occurrance events they represent. This causes percentages to be almost random in nature when reviewing at season's end. This also means that these percentages can relate to values that are beyond where any NHL player's "true value" would lie.

As an example, let's suppose that a computer generates a random number between 0 and 20. If the first number came up as '2' and you were asked whether the next number would be higher or lower, you'd be wise to guess higher as there is a greater probability you are right.

This is a similar concept when discussing regression to the mean. The difference is that instead of all twenty numbers being equally likely, the probabilities have a distribution like the bell curves shown previously. When players like 2011-12 Jordan Eberle or 2012-13 Nazem Kadri finish a season with highly inflated On-Ice Sh%, we state it is likely to regress next season. It is not that we are saying their past inflated numbers must be equaled out with future deflated numbers to reach the mean, but simply stating the probability of repeatedly inflated percentages is extremely unlikely.

Summary, and what this means for Winnipeg Jets fans

Wherever you lose full control of an outcome, there is chance. A player may affect the probabilities but cannot will destiny. The smaller a sample size is, the larger the probability that chance dictates said results. For these reasons, sample size and repeatability should always be thought of when evaluating a player with a statistic. With some statistics being more volatile than others in a season's context, regression to the mean is an essential concept to keep in mind.

Due to their volatile nature, On-Ice Sh%, On-Ice-Sv% and PDO are key components to add context when it comes to the likeliness of repeatability with players point totals and traditional +/-. When we see inflated numbers — such as those recorded by Andrew Ladd, Bryan Little and Blake Wheeler in 2013 — we can gamble that it’s unlikely for points to be scored at similar rates in the future. This is not to take away from the fact that they played well this season and created a plethora of chances from their skill-sets. We can also use deflated percentages to expect the opposite, as when we at Arctic Ice Hockey predicted a strong comeback this season from Eric Fehr, even prior to his European success.

For more articles on variance, chance and luck, there is a reference library over at NHL Numbers.

More from Arctic Ice Hockey:

<!– TIPS: 1. Replace with your facebook page embed, otherwise it will use the default SB Nation FB page. 2. IMPORTANT: Be sure to remove the inline ‘width’ and ‘height’ attributes from embed code. 3. IMPORTANT: In the event that you use your own code, be sure to make sure the border_color portion of the looks exactly like this: border_color=%23ffffff& –>