Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Bill Stewart Dead From Apparent Heart Attack

The League of Extraordinary Statisticians: Recording Bias

The League of Extraordinary Statisticians (LOES) is a weekly forum bringing together top analytical minds in the hockey world to answer a variety of questions that straddle the line between stats analysis and something you might hear floating around section 304.  They have agreed to answer these questions in a few paragraphs or less, and with minimal formulae.  Because this is a forum, we'd encourage you to use the comments section to answer the questions yourselves, or to discuss or debate the answers given.

The LOES is not meant to represent the entire of the hockey stats community.  There are a number of people that either were too busy or too difficult to contact for the purposes of the forum.

This post, the final instalment of the LOES for the 2010-11 NHL season, is a grand opportunity to sew things up with a little troubleshooting.  You see, statisticians are so frequently dealing with the various challenges of data collection, exploring new ideas for metrics, and defending their approaches that they don't often address the nagging thorn-in-the-side that is recording bias (Gabe has, though....multiple times).  Yet the fact that one arena counts shots or blocked shots or takeaways very stingily and another like they're going out of style signifies a difficult challenge for analysis.  This week, as a sort of New Year's Resolution, our LOES is going to propose some ideas for improving data recording, with the hopes that we can make advances in these areas in the future.

Star-divide

The question for this week: One of the issues affecting stats analysis is home recording bias. What needs to be done to address/fix/correct for this problem?

Home bias only really applies to data that isn't discrete and objective.  For instance, a goal is a goal - a biased scorekeeper can't record a goal as a save, nor vice versa.  Same thing goes for penalties.  But events like shots and hits are partly subjective, and are recorded differently from arena to arena.
 
I'm very confident that most (if not all) NHL teams have their own set of scorekeepers that review every game tape and record their own totals for the subjective events, using their own definitions.  It's really not that an expensive thing for a team to do, especially considering how valuable the data is for them.  If that information could be made available to the public, that would be awesome, even if it wasn't until the end of the season when there wasn't as much of a competitive advantage to keeping it secret.
 
In a perfect world the NHL would do this for everyone.  That is, have objective and clear-cut definitions for all types of events, and have their own panel of impartial scorekeepers reaching a consensus on each game, and then make that data freely available to all teams and the public.  I certainly think that will happen some day, as the league has really improved by leaps and bounds in what information they record, and how they record it.
 
Until then, what to do?  Often you want as large a sample size as possible, but if the home bias is bad enough you might be better off just using road data only.  Ideally, if you can pin down the home bias with any accuracy, you can adjust the home data appropriately.  The alternative is to organise the fans to review game tape themselves, something that's already being done with respect to scoring chances.

- Rob Vollman, Hockey Prospectus

The first fix would be for the NHL to actually care about the quality of the data.  There is little evidence that this is the case.  If there is no sense that something is broken, it won't get fixed.  The NHL should care because this data has considerable potential to help with our understanding of the game and the NHL gets our extraordinary analytic insight largely for free.

 

To fix it requires one or more changes to training, quality control, peer review and/or technology.  There is not much of any of this going on now.  There are some system edits that could help.  Multiple scorers might help in combination with system changes to average or flag discrepancies.  The cheapest solution could well be to average out the problem by forcing the RTSS scorers to travel.

 

A high-tech solution would be to automate much of the process.  We mainly care about puck movements, including shots.  In fact, we only have shots but need to understand the other movements.  Football (the global kind) has developed digital systems to track player and ball movements.  Hockey could do that...

 

- Alan Ryder, Hockey Analytics

I've believed for years that there should be a development and certification program for Off-Ice Officials (OIOs).  We spend millions of dollars developing players and millions of dollars developing referees, so why not institute a training program that helps standardize the interpretation across all NHL arenas the various actions that go into the statistical record?  Such a program should include both the technical instructions on the RTSS system and video examples of each element, both affirmative and non.  I don't think that the OIOs would have to be shipped in for an annual summer conference, just put the curriculum onto a DVD.  This would also allow the OIO to watch and repeat the examples at his/her own pace.  Once completed, the league could then test them online using other video examples.

- Marc Foster, Hockey Prospectus

Scorer bias is a problem in hockey sabermetrics.  If the initial data is garbage then we will get garbage out.  This is a problem that goes beyond a home scorer bias.  Some cities score hits or shots on goal or other stats more liberally than others.  
 
One solution often used is to look at road games only.  This averages a player's stats among the 29 other cities in the league.   This reduces sample size and doesn't caputre the whole story.  Using stats as of Sunday night, Danny Briere has 40 road points and only 23 home points.  Mike Richards, his teammate, has 40 home points and only 23 road points.  If we look at only home games, Richards appears far more offensively productive and if we look at road games Briere appears more offensively productive, when the reality is they have produced the same number of points.  We will get flukes like these using only road numbers with any other stat.
 
The longterm solution is to better normalize statistics.  To have internal testing to make sure that a stat in one city is recorded the same in another city.  In baseball, for example, they have had machines recording balls and strikes and rate umpires on their agreement with the machines. The NHL has little motivation to see this happen.  Do they see any increase in revenue by better standardizing their stats?  So I won't hold my breath waiting.

- Greg Ballentine, The Puck Stops Here at Kukla's Korner

Ideally you'd take the human element out of it by electronically tracking the puck on the ice.

- Corey Pronman, Hockey Prospectus

I just want to start by saying that the cooperative efforts of the various hockey statisticians recording scoring chances this year has been absolutely awesome.  One-third to one-half of the league is having its scoring chance data recorded by a number of independent fans willing to watch their games with a pad and pencil, and it's those kinds of people that remind me of the things I've always liked about the hockey stats community: a willingness to put in the work paired with a deep love for the game.  Rock on.

A lot of our LOES agree, and I think you can tell by their response, recording bias is a frustrating problem that doesn't appear to have easy solutions.  But it's possible that the answer lies in Alan Ryder's response, in particular where he talks about incentive.  Can you imagine how many applicants you'd have if the NHL undertook an initiative to hire a few more scorers to cross-check (pardon the pun) recording data, and instituted a bit more of an official process (such as what Marc Foster suggests)?  The word can even be put out specifically through sites like HP and BTN and Hockey Analytics, just to catch the people who are interested in recording data.  As Ryder notes, there can be benefits to better recording; I'd add that statisticians would be in just as good of a spot to offer suggestions for improving the quality of play as no small number of GMs and owners.

Or we could just go robot on this mother, though the NHL might want its puck back when it goes into the crowd.  Kidding, they can do that pretty cheap...but they could also track you back to your home.  Once again...kidding.  Though we do run the risk of one of the pucks generating human-like emotions, and yearning for a better life outside of its cold, violent existence...

At the base level, we need to be aware of recording bias, and we should be vigilant about identifying the various recording biases across different arenas.  The better we work together to reduce recording bias, the better we can analyze and understand the game.

Comment 10 comments  |  1 recs  | 

Do you like this story?

Comments

Display:

“Though we do run the risk of one of the pucks generating human-like emotions, and yearning for a better life outside of its cold, violent existence…”

That’s a great idea for a Sci Fi series

by ThrashersRecaps on Apr 5, 2011 11:53 AM EDT reply actions  

I’m with Rob and Alan on this one: have the NHL and NHLPA establish as clear criteria as possible, and hire independent, traveling, off-ice scorers, who are monitored and peer reviewed. Even then there will be some lack of consistency, because you can talk about all the “objective and clear-cut” definitions you want, but the more you try to clarify something, the more you open up potential loopholes.

I would like to also apply for the job of the independent traveling, off-ice scorer.

Glen Sather is a Hockey Genius.

http://twitter.com/ThGeneralissimo
http://twitter.com/poplosertwit

by poploser on Apr 5, 2011 1:47 PM EDT reply actions  

I’m surprised the NHLPA doesn’t push for something like this in their CBA negotiations.

When something like save percentage is a driver of performance bonus achievement, I would think the NHLPA would want to make certain it’s recorded in a consistent and reliable manner.

by Bourque77 on Apr 5, 2011 2:30 PM EDT up reply actions  

Why?

ATM, the way it’s hired makes it hard for anyone to tell when a goalie is sucking, and allows goalies who get favorable situations to stay employed.

Yeah it hurts some guys, but it helps others. So it’s probably not much of a concern.

by garik16 on Apr 5, 2011 8:58 PM EDT up reply actions  

The NHLPA doesn’t seem to have a problem at all with merit play and seem to encourage it, so I’d think they’d like the idea of those that have earned the pay to get it.

I also think the official stats (at least for goalies) don’t have that much of an impact on the decisions by NHL GMs on whether they keep a job or not. Yann Danis had a couple of good seasons by save percentage in backup minutes and is in the KHL now. His .923 last year should have been a way to get him more playing time this year, not less I would think.

by Bourque77 on Apr 5, 2011 9:20 PM EDT up reply actions  

Arent performance bonuses basically prohibited in the CBA?

Glen Sather is a Hockey Genius.

http://twitter.com/ThGeneralissimo
http://twitter.com/poplosertwit

by poploser on Apr 5, 2011 9:04 PM EDT up reply actions  

At least players under ELCs are eligible for performance bonuses.

by Bourque77 on Apr 5, 2011 9:13 PM EDT up reply actions  

only ELCs and guys who’ve been hurt a lot in the recent past.

My blog and Twitter, featuring coverage of the most frustrating team in the NHL
If you don't know how to use Timeonice, read this.
Behindthenet quick link to QoC/QoT/Corsi/PDO/Zonestarts
"Numbers don't lie, they just don't agree with you"--George E. Ays
If I reference a lot of stats, just assume I haven't seen anything to contradict or invalidate them.

by red army line on Apr 5, 2011 10:56 PM EDT up reply actions  

On this point, the variance could be kept to a minimum by limiting the number of games at any one arena and/or any one team that any OIO covered. Fro example, 82 games, 30 teams, 30 OIOs averages to less than 3 games per team if no one OIO covered more than 4 (more difficulty scheduling I’m sure)-6 (more scheduling flexibility) games of any one team, a lot of the variation that could be introduced will be spread evenly around the league and no one team or player would get the benefit of any one scorer’s quirks.

by FrankG929 on Apr 5, 2011 6:33 PM EDT up reply actions  

i do a fuckload of amateur off-ice officiating (timekeeping, scorekeeping, shot counting, blah blah blah), and one thing i’ve learned is that off-ice officials- like all on-ice officials, and all hockey fans generally- all have slightly varying definitions of every hockey event. one ref might call something a hook that another would let slide, one counter might call something a hit that another would barely notice. no matter how much you try to standardize the definitions, people retain an incredibly strong commitment to their interpretation, and oversight can only mitigate the degree of counting bias rather than eliminate it.

my theory is that for micro-counting to be done with any semblance of accuracy, it has to be done ex-post-facto, by video, at some kind of centralized location. basically, the equivalent of an office where people watch the games after they’re finished, with the luxury (and expectation) of going back and rewatching unclear sections. you’d have to wait a little longer after every game to get the data, but it would be vastly better data. if you can’t have one person doing all the counting, the next best thing is to be able to assign counters randomly, so the bias is distributed among all teams. it’s not necessary to send people traveling to do it live, though- i can’t think of a reason why a live count would be more accurate, and it’s hella more expensive.

by ephie on Apr 6, 2011 6:44 AM EDT reply actions  

Comments For This Post Are Closed


User Tools

The finest Winnipeg Jets analysis on the internets

FanPosts


Managers

Hawerchuk_small Hawerchuk

Gary_bettman_bad_dreams_small Bettman's Nightmare

Grapes_small canadian texan

Howe_small TJCAPS

Editors

Ryan_small SO_RyanP

0_small maplestirup

Jets2_small arby_18