clock menu more-arrow no yes

Filed under:

The League of Extraordinary Statisticians: Recording Bias

New, comments

The League of Extraordinary Statisticians (LOES) is a weekly forum bringing together top analytical minds in the hockey world to answer a variety of questions that straddle the line between stats analysis and something you might hear floating around section 304.  They have agreed to answer these questions in a few paragraphs or less, and with minimal formulae.  Because this is a forum, we'd encourage you to use the comments section to answer the questions yourselves, or to discuss or debate the answers given.

The LOES is not meant to represent the entire of the hockey stats community.  There are a number of people that either were too busy or too difficult to contact for the purposes of the forum.

This post, the final instalment of the LOES for the 2010-11 NHL season, is a grand opportunity to sew things up with a little troubleshooting.  You see, statisticians are so frequently dealing with the various challenges of data collection, exploring new ideas for metrics, and defending their approaches that they don't often address the nagging thorn-in-the-side that is recording bias (Gabe has, though....multiple times).  Yet the fact that one arena counts shots or blocked shots or takeaways very stingily and another like they're going out of style signifies a difficult challenge for analysis.  This week, as a sort of New Year's Resolution, our LOES is going to propose some ideas for improving data recording, with the hopes that we can make advances in these areas in the future.

The question for this week: One of the issues affecting stats analysis is home recording bias. What needs to be done to address/fix/correct for this problem?

Home bias only really applies to data that isn't discrete and objective.  For instance, a goal is a goal - a biased scorekeeper can't record a goal as a save, nor vice versa.  Same thing goes for penalties.  But events like shots and hits are partly subjective, and are recorded differently from arena to arena.
I'm very confident that most (if not all) NHL teams have their own set of scorekeepers that review every game tape and record their own totals for the subjective events, using their own definitions.  It's really not that an expensive thing for a team to do, especially considering how valuable the data is for them.  If that information could be made available to the public, that would be awesome, even if it wasn't until the end of the season when there wasn't as much of a competitive advantage to keeping it secret.
In a perfect world the NHL would do this for everyone.  That is, have objective and clear-cut definitions for all types of events, and have their own panel of impartial scorekeepers reaching a consensus on each game, and then make that data freely available to all teams and the public.  I certainly think that will happen some day, as the league has really improved by leaps and bounds in what information they record, and how they record it.
Until then, what to do?  Often you want as large a sample size as possible, but if the home bias is bad enough you might be better off just using road data only.  Ideally, if you can pin down the home bias with any accuracy, you can adjust the home data appropriately.  The alternative is to organise the fans to review game tape themselves, something that's already being done with respect to scoring chances.

- Rob Vollman, Hockey Prospectus

The first fix would be for the NHL to actually care about the quality of the data.  There is little evidence that this is the case.  If there is no sense that something is broken, it won't get fixed.  The NHL should care because this data has considerable potential to help with our understanding of the game and the NHL gets our extraordinary analytic insight largely for free.


To fix it requires one or more changes to training, quality control, peer review and/or technology.  There is not much of any of this going on now.  There are some system edits that could help.  Multiple scorers might help in combination with system changes to average or flag discrepancies.  The cheapest solution could well be to average out the problem by forcing the RTSS scorers to travel.


A high-tech solution would be to automate much of the process.  We mainly care about puck movements, including shots.  In fact, we only have shots but need to understand the other movements.  Football (the global kind) has developed digital systems to track player and ball movements.  Hockey could do that...


- Alan Ryder, Hockey Analytics

I've believed for years that there should be a development and certification program for Off-Ice Officials (OIOs).  We spend millions of dollars developing players and millions of dollars developing referees, so why not institute a training program that helps standardize the interpretation across all NHL arenas the various actions that go into the statistical record?  Such a program should include both the technical instructions on the RTSS system and video examples of each element, both affirmative and non.  I don't think that the OIOs would have to be shipped in for an annual summer conference, just put the curriculum onto a DVD.  This would also allow the OIO to watch and repeat the examples at his/her own pace.  Once completed, the league could then test them online using other video examples.

- Marc Foster, Hockey Prospectus

Scorer bias is a problem in hockey sabermetrics.  If the initial data is garbage then we will get garbage out.  This is a problem that goes beyond a home scorer bias.  Some cities score hits or shots on goal or other stats more liberally than others.  
One solution often used is to look at road games only.  This averages a player's stats among the 29 other cities in the league.   This reduces sample size and doesn't caputre the whole story.  Using stats as of Sunday night, Danny Briere has 40 road points and only 23 home points.  Mike Richards, his teammate, has 40 home points and only 23 road points.  If we look at only home games, Richards appears far more offensively productive and if we look at road games Briere appears more offensively productive, when the reality is they have produced the same number of points.  We will get flukes like these using only road numbers with any other stat.
The longterm solution is to better normalize statistics.  To have internal testing to make sure that a stat in one city is recorded the same in another city.  In baseball, for example, they have had machines recording balls and strikes and rate umpires on their agreement with the machines. The NHL has little motivation to see this happen.  Do they see any increase in revenue by better standardizing their stats?  So I won't hold my breath waiting.

- Greg Ballentine, The Puck Stops Here at Kukla's Korner

Ideally you'd take the human element out of it by electronically tracking the puck on the ice.

- Corey Pronman, Hockey Prospectus

I just want to start by saying that the cooperative efforts of the various hockey statisticians recording scoring chances this year has been absolutely awesome.  One-third to one-half of the league is having its scoring chance data recorded by a number of independent fans willing to watch their games with a pad and pencil, and it's those kinds of people that remind me of the things I've always liked about the hockey stats community: a willingness to put in the work paired with a deep love for the game.  Rock on.

A lot of our LOES agree, and I think you can tell by their response, recording bias is a frustrating problem that doesn't appear to have easy solutions.  But it's possible that the answer lies in Alan Ryder's response, in particular where he talks about incentive.  Can you imagine how many applicants you'd have if the NHL undertook an initiative to hire a few more scorers to cross-check (pardon the pun) recording data, and instituted a bit more of an official process (such as what Marc Foster suggests)?  The word can even be put out specifically through sites like HP and BTN and Hockey Analytics, just to catch the people who are interested in recording data.  As Ryder notes, there can be benefits to better recording; I'd add that statisticians would be in just as good of a spot to offer suggestions for improving the quality of play as no small number of GMs and owners.

Or we could just go robot on this mother, though the NHL might want its puck back when it goes into the crowd.  Kidding, they can do that pretty cheap...but they could also track you back to your home.  Once again...kidding.  Though we do run the risk of one of the pucks generating human-like emotions, and yearning for a better life outside of its cold, violent existence...

At the base level, we need to be aware of recording bias, and we should be vigilant about identifying the various recording biases across different arenas.  The better we work together to reduce recording bias, the better we can analyze and understand the game.