Recently a lot of people have been talking about sample sizes. Often it's brought up in regards to the reliability of CORSI/FENWICK over other traditional measures, mentioned in relation to team strength, playoff success, and momentum. Personally I feel it best illustrates the reason why observation can be misleading. But what is sample size? In statistics it's often referred to as "N", and is a huge part of the pre-study analysis from healthcare to economics, and virtually every other industry. Sufficient data is always the biggest concern with any analysis. In a series of articles I'm going to do my best to come up with some concrete numbers that allow people to actually gauge adequate sample sizes; and hopefully illustrate the differences between statistical measures in terms of sample sizes. In this way we will hopefully be able to predict how many games or seasons it takes to truly validate a statistic.
We begin with a preface: before we jump into anything, I think it's important to look at the big picture; that is the flash of light that an NHL career truly is. Unfortunetly for many players the NHL is a results driven industry that doesnt have time to wait and see if results are deserved. We first look at the typical life-span of a NHLer.
I decided to use a Kaplan-Meier esque curve to look at NHLers. Here I created two graphs that display all retired NHL players from about 20-30 years worth of data, looking at both how many seasons they played in, and how many games they played in. The y-axis represents the cumulative proportion of players left in the NHL, the x-axis is time in either seasons or games. Thus if you wanted to know what the probability of a player lasting 82 games in the NHL just go to 82 on the x axis, up to the curve, and trace it over to the y axis for your probability.
I included confidence intervals (CI) just in case some of you were suspicious. Believe me there is more than enough data in these 2 samples. (N > 2000)
Both of these graphs illustrate the dramatic drop in players over a short amount of time. By a players 117th NHL game half of the NHL players he started with are never to play another game. From the above graph we see that represents about 2-3 NHL seasons. Those that make it to 1000th NHL games are in the 5th percentile (or 95th, however you want to look at it). If your interested in the actual data I could send you the excel sheet I created; or you can use the equation on the right of the games graph which gives you a decent estimate.
So we begin our search with a preface and a reminder that unfortunately for most players they may never see enough games to warrant their exit from the show. Something we should keep in mind as we move forward in our search of sample sizes.