My goal with the "Sample Size War" is get a better understanding of some of the widely used statistics here, and other places. Today I'm posting on goals scored by forwards.
The idea for this started way back in a LoES post about what the "best" statistic was; and much to my amazement most of the bloggers used TOI, which is a measure of how much a coach values a player. I found this a little odd, and then realized that no one has truly compared traditional stats with advanced stats. So in a way I'm rehashing the article, but allowing the stats themselves to make an argument. I decided to use reliability statistics to evaluate goals for forwards, and I came to the conclusion that it takes about 38 games for goals to be reliable. I'll go through my methods, (its nerd-tastic so you can skip ahead if you want) results, and discuss the implications.
I began by selecting players that had played 3 consecutive years with at least 40 games played each year. (I'll discuss the implications of this later) I sampled about 272 forwards and 140 defensemen. I next grabbed game logs of all the players to compile goals per game up to 120 games for each player. I began by using split-half season reliability, in which even and odd games are split and a correlation is generated. But after looking into more options for evaluating reliability I found a statistic that has been used in education, psychiatric sciences, and a few places elsewhere. Cronbach's alpha is very similar to split-half season correlation, but proves to much more valuable. It effectively takes the correlation between all possible data points, not just even odd games. In addition it can be used to evaluate a stat without having to generate twice the amount of games. I included both measures for people who want to look at them both for comparison. Initially I wanted to take a sample and draw conclusions from that set, but I decided to use the entire population in this first article to discuss the implications of doing so. Once i gathered all 272 players worth of data I generated a Cronbach's alpha, split-half season, and looked at how many games it took to cross the 0.7 coefficient mark. From this data I then generated a few sample size graphs to illustrate how sampling may vary, and to give myself a better idea of how many players I should sample for future stats.
Goal reliability for forwads first in graphical format;
From the data we find that Cronbach's alpha reaches the 0.7 mark at 38 games, and reaches the 0.7 mark for split-half-season at 41 games. At 120 games the Cronbach's alpha rests at about 0.85. The 95% confidence interval for Cronbach's alpha is 27-51 games at the 0.7 mark.
I randomly assigned the 272 players 1-272 and then calculated Cronbach's alpha from that data set. On the x-axis is the sample size, on the y-axis is Cronbach's alpha at 38 games. We can see it is quite variable until about 100 players or so. The major movement around 50 is Correy Perry, with Sidney Crosby creating movement around 200. I think this shows the importance taking a significant sample.
Here is another graph illustrating the importance of sample size. Using the same set as above this time I looked at where Cronbach's alpha became 0.7 based on the sample size. Again we see significant variability until 150 players.
we can conclude that it takes about 38 games before we can say that the goals statistic is reliable for forwards. My suspicion is that the population creates a very conservative estimate, and it probably takes more games to evaluate the entirety of the forwards that have played in NHL. In order to get to 120 games I had to select a specific group of forwards, and subsequently the players in this data set are mostly consistent goal scorers, which makes the statistic of goals appear more reliable (ie arrive at 0.7 sooner).
I'd also like to briefly discuss correlation coefficients. Every industry sets different standards for correlation (r) and regression (R^2) metrics. I looked into a lot of different sources to determine what I felt was reasonable for working with hockey numbers. At this point in time it seems reasonable to me to follow in the footsteps of other sabermetric works (mostly baseball) and assume a correlation above 0.707 to be a good benchmark. When applied to a regression this accounts for 50% of the variability (.707^2 = 0.5). And hell, if its good enough for government work, it's good enough for me. This is subject to change though, and like many other things its importance is contingent upon comparison with other statistics. Realistically the number doesn't matter as much as how it compares with other stats.
Hopefully I'll be putting up defensemen goal reliability soon, as of now it looks to be significantly higher than forwards.