SSW: With so much luck, when are stats useful?
With all the talk about PDO, and given the time of year I figured it would be appropriate to show the final results from my work on traditional stats (G, Ast, Pts, PIM) reliability. As has been discussed a lot lately PDO numbers tend to stabilize for teams around 1000. This begs the question, what stats can we use for individual players? When are they reliable? Here's a nice chart after the jump for you to ponder. (ref early SSW work for methods here, here, and here)
Basically the second column on the left is the number of games a player must play before we can say with reasonable confidence that the stat in the row (traditional counting stat) is significant (ie. probably accurate)
For Forwards;
| Stat | Games when alpha > 0.707 | 95% Confidence Interval | alpha at 120 games |
| Goals | 40 | 27-51 | 0.860 |
| Assists | 25 | 18-34 | 0.911 |
| Points | 14 | 10-19 | 0.938 |
| +/- | n/a | n/a | 0.447 |
| PIM | 36 | 29-47 | 0.907 |
For Defensmen
| Stat | Games when alpha > 0.707 | 95% Confidence Interval | alpha at 120 games |
| Goals | 77 | 60-114 | 0.777 |
| Assists | 38 | 31-47 | 0.891 |
| Points | 31 | 21-36 | 0.918 |
| +/- | n/a | n/a | 0.449 |
| PIM | 69 | 52-107 | 0.786 |
Alpha here is cronbach's alpha which is generated from a statistical formula that calculates the reliability of a test. It's basically a fanchy-schmancy correlation coefficient. If you want you could just take it as the r at the games specified. Ref early work on how these numbers can be used to calculate regression statistics.
My intention was to compute these for advanced stats as well, but it will be a lot more time consuming. I may or may not do it, time permitting. If you really want to see those numbers I could explain how, and let you crunch the numbers.
Contact me for the full spreadsheet
7 comments
|
2 recs |
Do you like this story?
Comments
I kind of laughed a bit
when I saw you did PIM. But I’m kind of interested as to why it takes so long.
Co-Manager at Arctic Ice Hockey
Want Jets historical stats, Gabe Desjardins metrics, Jets prospect scouting reports, player previews, and old school photos from the WHA days? Get your copy of the First Commemorative Maple Street Press Winnipeg Jets Annual for 2011-12 here.
by Bettman's Nightmare on Oct 29, 2011 3:15 PM EDT reply actions
PIM is a little strange
Obviously very influenced by 5min majors, but I would have thought D would be more reliable than F, as they tend to take more penalties. Instead was see a high standard deviation, without much reliability. My guess is that every team has there F goon they roll out, thus improving F PIM reliability
And conversely, a couple of more skill D that don’t play as physical and don’t take as many penalties.
SNN Sports - A theoretical Oilers blog (i.e. theoretically, I write stuff there). Link now 100% less broken.
Robertson's Rants - Exceedingly occasional, lengthy ramblings on hockey topics, hosted at Puck Podcast. And no, my name's not Doug.
and of course....
the less often the event, and the more random the longer it will take for a stat to stabilize. PIMs tend to be a bit random, and the same player doesnt always take penalties every game (outliers like Parros exluceded)
I should also mention here....
the “n/a” plus minus indicates incredible variability/unreliability. At a players 120th game we would have to regress his +/- 55% to league average (eg. player A +/- * 0.45 + leaguve avg +/- * 0.55) in order to give our best estimate of “true” talent, ie. the amount of +/- due to the player’s skill.
I’m not familiar with Cronbach’s Alpha, but looking at wikipedia and poking around google it appears to be primarily (exclusively?) used to measure the extent to which different scores or metrics measure the same thing. I could see measuring the consistency of assists and goals or something like that, but I don’t understand your methodology since you seem to be using Cronbach’s Alpha with a single stat. Are you treating the results of each game like they’re different items?
Apart from yours here, I couldn’t find any other articles using it in sports analysis or even with data that doesn’t look like a breakdown of tests or something directly in psychology or sociology. Could you direct me to some?
Driving Play - The Blog with Three First Lines
Thanks for reading
Yes Cronbach’s is used mostly for test reliability, and has been used extensively in psychology and sociology. What peaked my interest was that mathematically Cronbach’s alpha takes the average of all possible correlation coefficients within the set of data. Initially when I ran the samples I also calculated a Pearson Product Correlation Coefficient, which was nearly identical. The other advantage of Cronbach’s is you only need half the data. In order to calculate a correlation coefficent you have to split games into even and odd games, whereas Cronbach’s you can use a single set of games. (The obvious bias here being games that close in schedule may not be truly independent events, but if it was there, I couldn’t detect it in the data)
If you go back and read the methodology from the first post, I do split up every game as a separate items, initially I broke them into even and odd samples, but stopped once I accepted Cronbach’s alpha as a better indicator.
Some resources:
http://www.socialresearchmethods.net/kb/reltypes.php – gives a good overview of various reliability tests
http://www.countthebasket.com/blog/2008/05/19/regression-to-the-mean/ – from Eli at the now defunct count the basket blog describing reliability. (I think he works for some NBA team now)
http://www.insidethebook.com/ee/index.php/site/article/intraclass_correlation/ – An excellent baseball sabermetric site talking about ICC which is very similar to Cronbach’s

by 















