Beating Save Percentage to Death Part 1: Age
I've looked at even strength save percentage for all goalies from 1998-2010. The data for 1997-1998 for special teams on the NHL.com web site is flawed, so I have omitted it.
Unweighted
A first look at the relationship comes from unweighted data. Each goalie is one data point, whether he played 1 game or 80. Here is the scatter plot:

Model Summary
|
|
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
|
|
.04 |
.00 |
.00 |
.04 |
|
|
|
Sum of Squares |
df |
Mean Square |
F |
Significance |
|---|---|---|---|---|---|---|
|
|
Regression |
.00 |
1 |
.00 |
1.23 |
.27 |
|
|
Residual |
1.22 |
960 |
.00 |
|
|
|
|
Total |
1.22 |
961 |
|
|
|
|
|
|
B |
Std. Error |
Beta |
t |
Significance |
|---|---|---|---|---|---|---|
|
|
(Constant) |
.90 |
.01 |
.00 |
129.87 |
.00 |
|
|
Age |
.00 |
.00 |
.04 |
1.11 |
.27 |
|
|
|
|
|
|
|
|
So, the slope of save percentage over age isn't just close to 0.0, it is 0.0.
Weighted
We can rerun the regression, using shots faced as a weighting factor. The scatter plot does not look any more promising:
Model Summary
|
|
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
|
|
.11 |
.01 |
.01 |
.00 |
|
|
|
Sum of Squares |
df |
Mean Square |
F |
Significance |
|---|---|---|---|---|---|---|
|
|
Regression |
.00 |
1 |
.00 |
.26 |
.61 |
|
|
Residual |
.00 |
21 |
.00 |
|
|
|
|
Total |
.00 |
22 |
|
|
|
|
|
|
B |
Std. Error |
Beta |
t |
Significance |
|---|---|---|---|---|---|---|
|
|
(Constant) |
.91 |
.00 |
.00 |
261.15 |
.00 |
|
|
Age |
.00 |
.00 |
.11 |
.51 |
.61 |
|
|
|
|
|
|
|
|
And, indeed, it is not any better.
Serial Measures
One last way to look for a relationship would be to look at each goalie relative to himself as he ages. For this analysis, I have restricted the analysis to goalies with at least 3 seasons. For each goalie, I computed his personal slope. We can then look at the average of these slopes. The null hypothesis is that the average slope is 0.0.
DISTRIBUTION PARAMETER ESTIMATES
========================================================
Slope (N = 140) Mean = -0.001 Variance = 0.000 Std.Dev. = 0.011
0.950 Confidence Interval for mean : -0.003 to 0.001
Once again, the slope is not significantly different from 0.0.
Conclusion
Analyzing the data every way I can think up, there is no evidence whatsoever of any relationship between goalie age and even strength save percentage.
23 comments
|
1 recs |
Do you like this story?
Comments
Nice post. I’m pretty surprised by the conclusion. I would think most goalies tend to improve their save percentage as they age toward what we might traditionally call their peak age. What happens if we look at only the data from goalies who started their NHL career before they turned 21? Does that group show improvement as they get older?
So this is what it feels like to be famous!
I’ll run that and post it. I suspect the sample sizes are pretty small.
by DoctorMyBrainHurts on Apr 28, 2010 1:40 PM EDT up reply actions
I put up the data on young goalies:
http://www.behindthenethockey.com/2010/4/28/1449233/beating-save-percentage-to-death
And thank you Scott .
by DoctorMyBrainHurts on Apr 28, 2010 3:26 PM EDT up reply actions
…there is no evidence whatsoever of any relationship between goalie age and even strength save percentage.
Really? Then why would a goalie ever retire? Why don’t they all crack the league at 18? Is Patrick Roy at 12yo = Patrick Roy at 27yo = Patrick Roy now?
Maybe a linear regression isn’t quite the right tool for the job.
I ran it as an ANOVA using Age as a categorical variable. Still no significant differences between categories.
by DoctorMyBrainHurts on Apr 28, 2010 1:43 PM EDT up reply actions
There are a number of problems with your analysis but let me point out two big ones.
The first problem I see is that I can reasonably assume that the relationship between save percentage and age is not linear. It is more likely the case that a goalie improves in his early 20’s reaches a peak, and then falls off later in his career. If this were the case and the rate of improvement was equal to the rate of descent then yes, a linear regression analysis would result in a slope of around 0. For example, the data set 0.900, 0.905, 0.910, 0.915, 0.910, 0.905, 0.900 has a slope of zero though clearly if those save percentages are a goalies save percentages as he ages I would draw a completely different conclusion than what the linear regression analysis told me.
The second problem I see in your analysis where you compare goalies to themselves and you are using goalies with as little as 3 years of data. To me that doesn’t seem like a strict enough limitation as I am not sure we really should expect there to be much difference in a goalie at age 25 vs age 27. Any relationship between save percentage and age is likely to occur over a much longer period than 3 years. Try limiting your analysis to goalies that have played 12+ years and see what you get.
----------------------------------------------------------------------------------
HockeyAnalysis.com - Taking a Deeper Look at the World of Hockey
Re: the first Problem, do you see that pattern in the data as graphed?
Re the second Problem, I would recommend mixed model analysis (I do harp don’t I) but a quick graph with each Goalie as an individual curve should show if this is worth pursuing. I doubt it though. My bet is noisy will overcome any signal in this analysis without a whole lot of other factors in the model.
Rule one of statistics is graph it.
As for the overall analysis I would also be concerned with “sampling” bias. in the NHL a goalie will need to meet a certain “standard” of play. The range may not be great enough for any real pattern to become evident. Most goalies as their skill declines will be shifted out of the NHL or their starting role except for the elite that may stick either because their bad is another’s good or just out of reputation. Young goalie learning their craft won’t get play time.
by Mogen_david on Apr 28, 2010 11:38 AM EDT up reply actions
I agree with mixed model, but my software isn’t that sophisticated and I’m too cheap to spring for SAS. Dallal suggests doing serial measurements the way I did.
http://www.jerrydallal.com/LHSP/serial.htm
by DoctorMyBrainHurts on Apr 28, 2010 1:46 PM EDT up reply actions
R is free. R is powerful. R is always on the cutting edge. If a new techniques is being explored it is being explored in R.
R is not easy. R has an extremely steep learning curve (Like SAS doesn’t). R has lousy documentation. R has no customer support. The programmers in R have no desire or need to meet their “customers” need. If they don’t think a technique is kosher or don’t need it themselves it won’t be done.
In the end the whole powerful, free and cutting edge trumps all other issues. Plus it’s graphics exceed SAS by 10 fold 3 fold (SAS has been working on this, the advantage of SAS they need to meet customer demand).
I know it costs money (or effort to find a pirated copy) but Matlab is my package of choice. I wrote the entire back-end of my site in Matlab – regex, matrices, file creation.
I would have used SAS but that is what I work best in. It would have ended up with that clunky SAS look but I could have done it. Plus, I have a copy from work. Although I have to be careful with that. To be honest I would’ve had to cobble together something in R and MySQL.
I still recommend R to all the students I work with since it is a package that you can always use regardless of where you are. The learning curve is steep but since it’s free it doesn’t matter where you go they’ll usually let you use it (who argues with free and cutting edge) and you can always take it home to work on after hours. Almost all the other systems are worth learning but you may find yourself learning a different system when you move on.
I appreciate the suggestion. I have seen R but never spent any time on it. I did a MS in Statistics about 20 years ago, so I did a lot of work in SAS. Plus my wife worked for SAS for many years, so it used to be essentially free for me to use. “Used to be” is the the key phrase.
by DoctorMyBrainHurts on Apr 28, 2010 3:32 PM EDT up reply actions
Hi Gabe, nice post.
There is probably a selection bias involved here in that only goalies who “make it” will be included, amirite? You do reduce that somewhat by comparing goalies to themselves but that still implies that only the goalies that can stick around long enough are counted.
So maybe this is not quite enough for a GM to use as a predictive tool for deciding when to bring goalies into the show, e.g. the dude can’t just say “let’s bring up our 21 year old AHLer because there’s no point in wasting his cheap EVSV% years”.
Still, a very useful tool for predicting the performance of goalies who’ve already made it, wouldn’t you say?
I mean I think that a rational GM (how many of those are there?) would have some sort of evidence-based criteria for bringing a goalie into the show, whether it be scouting or relevant stats or whatnot. From that point onward, what you’ve shown here is that those goalies can, on the whole, be expected to perform just about the same when they’re young as when they’re old.
And of course all the caveats apply, a 40-year-old Hasek won’t be the same as Hasek-in-his-prime, but a rational GM would use evidence-based criteria (stats, or at that age most likely fitness level or injuries) to pull the plug.
But Hasek (or any other goalie) in the years where evidence says they can play, will likely possess an even talent level throughout their playable years.
I dunno, do I have that right? I gotta say, it flies in the face of my intuition. But the numbers are the numbers.
I applaud the objective, but I agree there are a few problems with this analysis.
1st, unweighted data for save percentage this is useless, you shouldn’t even bother.
2nd, nobody expects a constantly increasing or decreasing curve wrt age, so you could restrict your analysis to the 18-25 or the 34-40+ range and see if there’s an effect there. As it is, both effects could be present and cancelling themselves out.
3rd, selection/survival bias ensures that the average performance of any age slice of NHL players is roughly the same: 37-year-olds score as many points per game as 25-year-olds, there are just fewer of them! So you need to compare only goalies against themselves, as you did, but stretch your range as HockeyAnalysis suggested. If you’re looking only in the 18-25 or 34-40 area that’ll be easier.
As sisu said, it’s IMPOSSIBLE that there be no age effect, otherwise NHL-caliber goalies would play constantly from age 18 to 100. The effect could be milder than expected, however.
There’s also a survivor effect. Goalies get to play as long as they look good; if they post an .885 save percentage in the modern era, then they get cut because nobody wants to wait around long enough to figure out what their true talent is.
So goaltender careers – other than for the big guys – tend to last 3 or 4 or 5 years from the time that the goaltender breaks in to the league (ie – he’s bound to put up a bad year and get replaced). But how did he break in? He came up and posted a high save percentage right away. So we take the guys who were either lucky or good or both and let them play until they’re no longer lucky. That warps the aging curve.
I just posted a simple, straight forward analysis that shows goalie save percentage does vary by age in a manner that we would all expect: http://www.hockeyanalysis.com/?p=927
----------------------------------------------------------------------------------
HockeyAnalysis.com - Taking a Deeper Look at the World of Hockey
by HockeyAnalysis on Apr 28, 2010 10:47 AM EDT reply actions
Between the ages of 30 and 34 a goalies save percentage plateaus before starting a steady decline until age 40. This is very much as one would expect and it only takes looking at the data in the proper way to confirm our expectations.
If ever there was a perfect statement about the use (or misuse) of statistics in modern life.
Glen Sather is a Hockey Genius.
http://glensathersucks.com/
http://twitter.com/ThGeneralissimo
Yes, that sounds bad, but I didn’t mean that I could manipulate stats to explain what I would expect, but rather if you analyze the data correctly you will come up with conclusions that often reaffirm what you would expect.
Nevertheless, I have updated my analysis by charting a goalies save percentage relative to the league average save percentage and the results still show a strong relationship between goalie performance and age. http://www.hockeyanalysis.com/?p=928
----------------------------------------------------------------------------------
HockeyAnalysis.com - Taking a Deeper Look at the World of Hockey
by HockeyAnalysis on Apr 28, 2010 8:04 PM EDT up reply actions
I don’t have data going back that far, so I’m making a few assumptions.
There is a slight trend in my data (not statistically significant) for save percentage to go up with year. I suspect that effect is greater if you go further back. If so, that could be confounding your results.
You picked 19 goalies out of thousands who played in that era. I suspect selection bias plays some role. The only reason those guys stuck around that long was they were good enough to justify their cost relative to a younger goalie.
Goalies who get better year after year for 7 years stick around for 14 years seems a perfectly reasonable conclusion.
by DoctorMyBrainHurts on Apr 28, 2010 1:52 PM EDT reply actions
What about Europe / AHL as a source of data?
So I completely understand if this is difficult / impossible to track down, but to find out how goalies age, maybe European keepers would be a better source of data.
My thinking is that teams in these leagues don’t necessarily face the same pressures to win. At a minimum, it’s fair to assume they don’t have the ready supply of keepers to continually dump goalies once they show any signs of not being one of the 60-70 best in the world. As a result, you may get a larger set of players who play in the same league from their early 20s to mid 30s. Additionally, they’re still professional goalies, so I would think they’d still show a similar aging pattern to what you would expect from goalies who are good enough to make the NHL – the peak would likely not be as high, but the pattern for how they get there should still be the same.

















