I've taken a seat on the sidelines from much of the discussion on advanced stats over the last year or two as providing general Preds coverage at OTF has consumed more and more of my time, but while tinkering around with some stuff I've raised a question that perhaps the community here might be interested in chiming in on.
What I wanted to do was come up with a properly adjusted Corsi rating reflecting the various contextual factors that we can quantify; things like Zone Starts, teammates, competition, etc. I've seen a factor of 0.8 used in a few instances to adjust Corsi for each incremental non-neutral zone zone start (boosting a guy for taking draws in the defensive end, debiting him for easy starts in the offensive one), but wanted to verify that number.
So I used the 5-on-5 player downloads from BTN to run this question against both the just-completed NHL season, as well as lumping together the 4 most recent starting with 2007-8 (screening for players with >20 GP*), and found something a little different. The following are the results from running a series of single-variable regressions, i.e. Zone Starts vs. Corsi, Rel Corsi Qual Comp vs. Rel Corsi, and more:
|Relation to Corsi||Relation to Rel Corsi|
|Corsi Qual Comp||-2.2||0.0001||-6.05||0.001|
|Rel Corsi Qual Comp||-34.1||0.06||-1.4||0.0001|
|Corsi Qual Team||27.2||0.43||35.5||0.35|
|Rel Corsi Qual Team||15.7||0.57||2.96||0.02|
Zone Starts jump out as being a relevant, independent issue to correct for here, but the factor of 1.13 is quite a bit larger than the 0.8 normally used. Looking at just the 2010-11 season I came up with 0.84, and just the 2007-8 gives you 1.28, a pretty fair spread there.
I'm extremely hesitant to use the Qual Team data, because I don't know that it really tells us much. If two guys play together quite a bit, by definition their Corsi numbers will move in lockstep. If one player is carrying the water on his line, he'll not only lift his teammate's Corsi, by doing so he'll inflate his own QualTeam, after which a corrective factor would make his numbers look worse, and that just doesn't sound right.
Then there's the really strange result. See those R2 values for the Quality of Competition lines? They basically imply that there's no correlative relationship between a player's Corsi and his QoC, which seems surprising. When I ran a multi-variable regression with Corsi bumped against both Zone Starts and Rel Corsi Qual Comp, the results barely budged at all (instead of 1.13 with R2 of 0.31 for ZS by itself, it became 1.07 for ZS & -12.2 for Rel Corsi Qual Comp, but with the same R2 of 0.31).
The implications of this could be pretty significant, but I wanted to open the floor to discussion before proceeding. It's been a long time since I've dabbled in the dark arts, so I'd appreciate some peer review.
*When I ran these regressions against all player seasons since 2007-8, then screened it down to players with >20, or >40 games played, there was no significant difference in these values. I'm just arbitrarily presenting the >20 GP numbers here.