clock menu more-arrow no yes

Filed under:

Statistical OOPS

New, comments

We all use stats to inform our evaluations of players, especially when we directly compare two players. It seems like a simple task, but doing it right is often one of the hardest things to do. Let's look at the save percentages of two goalies: Antero Niittymaki (SJS) and Dan Ellis (ANA).

10-'11 09-'10 These seasons combined
Ellis 0.8981 0.9092 0.9029
Niittymaki 0.8959 0.9085 0.9046

Ellis had a better save percentage than Niittymaki for the last two seasons, yet Niitymaki had a better save percentage over this same exact time period when you look at the data together. How is that possible? It's not because of rounding error (I went out to 6 significant figures for all calculations) and it's not because of any rink recording bias either. The reason that this happened is because of a statistical paradox, explained after the jump.

If you simply average yearly totals then you're not getting the real picture. That's because when you average percentages you're inadvertently weighting them equally. The true average needs to be weighted according to how many saves were made in each season. It's called Simpson's paradox and it can happen with save %, shooting %, or any other stat that is a percentage. The result: a player can have a better percentage than another player for two (or more) years in a row, but still have a worse percentage over that same time period using the same exact data.

Kind of mind blowing when you think about it. This is really important because yearly stats influence contract negotiations, choosing fantasy teams, and generally how we think of a player. It just goes to show that when comparing players you need to look at career totals, don't just browse through yearly totals. But that's what people usually do because that's how stats are displayed. Doing so can give you a wrong impression of someone's true performance (and therefore potential). In some cases, it can give you the opposite conclusion of the truth.

How often does this happen? It might be more common than you think. Over these two seasons a couple other occurrences also came up: Martin Biron (NYI) had better season save percentages than Devan Dubnyk (EDM) but Dubnyk's average was higher. Also, Peter Budaj (COL) had better numbers than both Chris Mason (ATL) and Brian Elliott (OTT), but both of these goaltenders actually had higher averages than Budaj.

10-'11 09-'10 These seasons combined
Biron 0.9225 0.8964 0.9051
Dubnyk 0.9157 0.8895 0.9067
Budaj 0.8947 0.9171 0.9000
Mason 0.8923 0.9129 0.9059
Elliott 0.8930 0.9087 0.9005