Ben already posted about my Expected Goals paper from the 2012 MIT Sloan Sports Analytics Conference, but I thought I'd write a post also, so that I can describe the methods more succinctly and in less formal language than what's in the paper, and also to explain how you can compute expected goals for yourself...
This work was inspired by Fenwick and Corsi. Fenwick and Corsi are used because they are better indicators of performance than goals over periods of time that are shorter than one season. Corsi is Shots + Misses + Blocks, but it is unlikely that those are the best weights to assign to the stats. For example, how do we know that 2*Shots + Misses + Blocks isn't better? Or 3*Shots + 2*Misses + 1*Blocks? Or maybe a*Shots + b*Misses + c*Blocks for some other a, b, and c? (Note: All of the stats mentioned in this post are per 60 minutes at even strength.) So we could run a regression using Shots, Misses, and Blocks in odd games and Goals in even games to find the "best" a, b, and c. If we do this, we get 4, 1, and 2 to be the best weights: 4*Shot + 1*Miss + 2*Block.
But let's not stop there... let's also include a whole bunch of other team statistics in addition shots, misses, and blocks... stats like goals, hits, faceoffs, zone starts, etc. Then, let's see which combination of those statistics in odd games is the best predictor of goal scoring rate in even games.
I want to emphasize that we are building this not to explain goal scoring rate, but to predict goal scoring rate in even games based on stats in odd games. Various team stats in odd games are the predictors, and Goals in even games are the outcome.
So if we run a regression, we find that the significant predictors are goals, shots, net hits (hits against minus hits), and total faceoffs. It is not surprising that goals and shots are significant. Hits and total faceoffs might be a little surprising. See the paper and the comments here for a discussion. One interesting note is that hits against are a good predictor of goal scoring rate, possibly because hits against give info about possession. Misses and blocks don't make the final cut, possibly because they aren't adding any additional information about possession that isn't already included in shots and hits.
Check out the paper for some pictures showing how this model performs in comparison to Fenwick and Corsi. Here's a summary:
The correlation between Fenwick or Corsi in odd games and goals in even games is about 0.37.
The correlation between expected goals in odd games and goals in even games is about 0.55.
The corresponding predicted R^2 goes from 0.14 for Fenwick/Corsi to 0.31 for expected goals.
So, in terms of R^2, expected goals is about twice as good as Fenwick or Corsi at predicting goal scoring rates.
Expected goals tend to be conservative estimates of future goal scoring rates, but that is probably what we want. As mentioned on this site several times by many people, goal scoring rates "regress to the mean", or at least "regress to some mean", which may depend on the team or player. One way to estimate this mean is by using this expected goals stuff.
DanTheStatMan1 had asked about NYR and NSH in the comments of Ben's post. Here is how you can compute expected goals for any team:
-1.34 + 0.31*Goals + 0.03*Shots + 0.03*HitsAgainst - 0.03*Hits + 0.04*TotalFaceoffs
Not as nice-looking or easy to remember as Shot + Miss + Block, but we use computers to do the calculations anyway, so it s'all good.
So pick your favorite team, and plug in their goals, shots, hits against, hits, and total faceoffs (per 60 minutes at even strength) at some point in the middle of the season, and you get their expected goals scoring rate (per 60 minutes of ice time at even strength) for the rest of the season. (For short, I've been calling this "expected goals", but really it is "expected goals per 60 minutes of ice time at even strength".)
Since there are a bunch of Winnipeg Jets fans here, I'll give an example using Atlanta's stats from last year. Here are Atlanta's even strength stats in half their games last year:
So we compute their expected goals like this:
-1.34 + 0.31*(1.89)+ 0.03*(31.51)+ 0.03*(22.81)- 0.03*(19.99)+ 0.04*(48.41) = 2.21
This gives an expected goal scoring rate of 2.21 for the rest of their games. The actual goal scoring rate for the rest of their games was 2.31. I can give up-to-date expected goals for this season (as of yesterday's) hopefully later tonight or tomorrow.
You can use that formula for players also. Any of the On/Off ice statistics (at http://www.behindthenet.ca, for example) can be computed with expected goals instead of Fenwick or Corsi. QualTeam and QualComp can also be computed using expected goals. Then, you can compute Expected Goals rel QualTeam and Expected Goals rel QualComp.
Regression-based adjusted plus-minus statistics can also use expected goals instead of goals, which is what I did in the paper. These give a player's contribution to his team in terms of expected goals per 60 minutes at even strength, independent of teammates, opponents, and zone starts.
For expected goals against, use the same formula, but plug in goals against, shots against, etc. instead of goals, shots, etc. For expected goal differential, compute expected goals minus expected goals against.
I'd be interested in any feedback you might have, and it'd be great to hear how this performs with any other data sets out there.
PS fyi, I finally caved and got a Twitter account: @bmacBrianMac.