I guess it was only a matter of time until I put this together.
On Google documents (found here) I have placed an all-time GVT database that covered the NHL, WHA, AHL, IHL, the Swedish, Finnish, Russian, Czech and German elite leagues, as well as some of the Canada Cups and Olympic tournaments. More after the jump.
There is both a text file, multi_gvt.txt, and an Excel file, multi_gvt2.xls. The text file contains all the players; the Excel file only contains the first several hundred, as the site didn’t allow me to upload spreadsheets larger than ~10 MB (!). You can download the text file then paste it into an Excel spreadsheet.
Calculating the GVT data itself was fairly simple, although for many of these leagues the data I was able to find was quite rudimentary: Games, Goals and Assists. For recent years, goaltender shots against and player plus/minus are available, which at least gives us a skeleton of defensive information to work with. For now, I have restricted myself to seasons for which I had enough data to work with and in which there is enough overlap between the various leagues that I can be reasonably confident of my normalization rates. I’ve started in 1983 for the AHL, 1988 for the Swedish Elitserien, 1989 for the Finnish SM-liiga, 1992 for the IHL, 1999 for the Czech Republic League, 2001 for the German Deutsche Eishockey League and 2003 for the Russian Elite League / KHL. I have also included all the Canada Cups, World Cups and the 2006 and 2010 Olympics. More seasons will get added as I find the time.
The process of normalization, of course, is the most interesting part. GVT naturally normalizes to 3 goals a game, and I normalized goals and assists in the same way. I also normalized for schedule length, although because of the huge disparity in schedule lengths between various leagues, and the different levels of variance inherent in each, I could not normalize every league to 82 games: what would I do with an 8-game Olympic tournament? My compromise was to normalize versus a minimum of 70 games, so if a league had a 50-game season, I normalized the games played to 50 / 70 * 82 = 59 games. It’s not perfect, but it’s close enough.
The second part was to normalize for league difficulty. Past approaches, especially the most well-known ones by Hawerchuk normalize by games played. This is the correct approach when doing projections for the majority of players, as good players in lower leagues will often be marginal players in the NHL. However, I chose to use a translation system that was more accurate for elite players; to do this, I needed to normalize by ice time instead of by games played. Obviously, I don’t have ice time numbers for any league but the NHL, but I do have a simple algorithm to estimate ice time based on basic statistics. It’s not perfect, but it’s close enough. The upshot is that my normalization factors are slightly higher than Hawerchuk’s and track good players better but weaker players worse.
I’ll be adding more information on this database in the coming weeks. For now, numbers junkies, enjoy!