Statistics are used in many ways. In hockey, they are predominately used for counting goals, shots, saves, time on ice, etc. In the scientific realm, statistics are used in the same manner when comparing two different counting statistics, but in addition are also used with probabilities. In either case a person is likely to look at two samples different samples and come to some sort of conclusion, like "Hey, Sample A is larger than Sample B; this must mean that [conclusion]".
The problem with making such a comparison is that, unless the object being studied has full control over the future, there is a degree chance involved. Example: when a shooter with the puck makes a move, he can affect the chance of scoring; but there are other factors involved, like the opposing defenders, goaltender, surface of the ice, etc. The main focus in the scientific statistical community is determining how likely two samples differ from being an actual difference, rather than outside variables (ie: luck). They will run statistical tests to determine whether something is "statistically significant".
Almost exactly a year ago, Cam Charron compared how goaltenders differed under two coaches who have a reputation for improving their goalies. The two articles looked at save percentage with and without Dave Tippett and Ken Hitchcock. These articles found that -- on average -- the save percentage of goaltenders did improve (0.006 and 0.004) under both coaches; however it was small and not always consistently in an upwards direction. The question that often came up was: is this due to better systems from these coaches or is it just out of random variance that occurs in life?
I asked Cam if he had done any statistical significance testing, and if not could I. He gave me his blessing, and here we are months later as I'm finally getting to it.
Goodness-of-Fit Test (AKA Chi-Squared or X^2 Test)
Since most people on here care about hockey more than math, I'll keep it quick and simple. A goodness-of-fit curve essentially measures the probability that a sample differs from what is expected. In other words, we'll see how likely that these coaches improved the goaltenders rather than random chance.
Due to its nature, the test works best at looking at frequency datas. So, the comparison will be the amount of saves made while under the coach's system and the amount of saves expected if facing the same amount of shots with similar success without the coach.
Step 1: Expected Save Percentage
The first step is to calculate the expected save percentage if a goalie was equally effective in a different year. We do this by normalizing it for yearly differences. I will do so under the same method as Cam did in his articles. We take a goalie's save percentage without the coach, determine how it differs from the league average in the same seasons, then adjust as though they had the same difference under a different year. Example if a goalie was 0.002% under league average without the coach, his expected amount of saves would be that created with 0.002% under the league's average the seasons he was with the coach.
For starters, lets' look at goalies under Ken Hitchcock:
And now under Dave Tippett:
Step 2: Differences from Expected
Then it's as simple as determining how many shots a goalie faced, then comparing how many saves they made to how many you expected using the statistical test.
For goalies under Ken Hitchcock:
*I actually jimmied the numbers a bit here, as I added in the 2012-13 season for Halak. This isn't technically fudging the numbers, as all it is doing is increasing the sample size.
And now for under Dave Tippett:
The X^2 value is then 1.88 for Hitchcock and 0.61 for Tippett. What does that mean? Well, skipping the fancy math stuff, you can look at a table showing how many areas are being compared (in this case how many goalies) and determine with what certainty you can say they are the same (or the opposite what certainty they are different). There is a more precise way where you can use a computer program, but it isn't currently installed on my laptop.
In reviewing Hitchcock's fourteen goaltenders, a chi-squared 1.88 then states that the two distributions (expected and actual) are so similar there is more than a 99% probability they are one and the same, meaning that the difference can be most likely equated to chance.
Conversely, Tippett's nine goaltenders, a chi-squared a chi-squared 0.61 then states that the two distributions (expected and actual) are so similar there is more than a 99% probability they are one and the same, meaning that the difference can be most likely equated to chance.
For comparison sake, in the world of science, the common standard value must be less than 5% probability of being from random chance in order for something to be considered true. Although, sometimes a more strict threshold of 2.75% or 1% is used.
What can a Winnipeg Jets' fan make of this?
It's very common for hockey fans to excuse their goaltender, saying that under a different system or better defense the goaltender is likely to make more saves and therefore have a better save percentage. While this logically makes sense and is probably semi-correct, all indications seem to point out that is a very minor factor. So minor that it's almost insignificant.
The best defensive-minded coaches improved their goaltenders by 0.004 and 0.006 save percentage and even that seems to be predominately due to "luck" (ie: variance) than anything else. It is most likely than that these coaches are improving their goaltenders even less than this, if at all. For Ondrej Pavelec, this means it would be reasonable to believe his save percentage may improve under another team, but his career numbers would likely move from 0.906% to around 0.908%, still far below league average.
In the end, this was an exercise in peer-review and not really anything Jets' specific. As a community, we can only strengthen our case by consistently re-checking and re-evaluating the data and hypotheses we come up with. This article just becomes one more in a long line that indicates save percentage is likely only affected by team effects in a very insignificant manner.
As I'm sure you can read in the comments there was a mistake in one of the calculations (thanks to Eric for noticing). It has since been corrected and vastly changes the data, but to a better ends. I had also made a stats 101 error with translating my chi-squared values to p-values (something I believe Garik was noticing below in the comments section). Ironically these two errors cancel each other out for the most part. In fact, it becomes even a stronger case that teams do not effect save percentage.
These two fixes bring up the important subjects that none of us are infallible and the importance in looking at the evidence for yourself. This is why you can only refute evidence with more evidence. This is why the scientific method is so important. Only by peer-review and delving in for ourselves do we better our knowledge most efficiently. Just simply stating results aren't like how you like, want, or expect as reasoning for dismissal doesn't improve anything. This is so important, even in the simplest of cases, like this one where it was in regards to changing two small errors in spreadsheet formulas.
I only changed the tables to the correct numbers and the probabilities that the differences are due to chance in conclusion. The end results however are the same. There is no significant statistical difference in the expected and actual number of saves the goaltenders made with and without Hitchcock or Tippett.