The League of Extraordinary Statisticians: Sample Size

Steve Mason had a wicked awesome year in 2008-09. We loved him. We bought his jersey. We named our children Steve Mason Wendorf and Mason Steve Wendorf and wrote little stories with matriarchal characters named Ma Steveson (okay, not everybody did that last one, but I know one person who did). But in 2009-10 Steve Mason, so it seemed, disappeared, replaced by some kind of evil twin sent to undermine fantasy hockey seasons and the good Columbus Blue Jackets name. Where did he go? Surely that second season was the mistake, and 2010-11 would be better. Then 2010-11 rolled around and that damned evil twin was still there wearing Mason’s jersey. Maybe at that point we finally realized that that really is Steve Mason, maybe not, but we were pretty sure that whatever happened in 2008-09 was the anomaly.

Ladies and gentlemen, a small sample size can be a cruel seductress. It can lead us to believe that Craig Simpson is a 56-goal scorer or Andrew Raycroft the second coming of Gerry Cheevers. It can make us believe that Joe Mauer is a 30-homer power hitter, a conclusion on which I place an annual bet with a friend to run around my block naked should it be proven true. If nothing else, statistics teach us to proceed with caution, and they inherently seek more data before giving us solid conclusions. You can certainly run in the face of that, and as Gabe has pointed out GMs and coaches sometimes make decisions without the luxury of large samples, casting themselves into a sea of luck both good and bad.

This week, the LOES is going to give us an idea of what kind of sample we need before we can be pretty sure a player, to paraphrase Dennis Green, is what we think he/she is.

This week’s question: How much performance data do you need on an NHL player before you feel that their talent can be assessed accurately? Does it need to be strictly NHL experience?

It totally depends on what aspect of the player’s ability is being measured. For instance, a player makes so few shoot-out attempts in a season that you could watch his entire career and still not have a very accurate assessment of his true abilities.

It comes down to more than just the number of events, though. The persistence and/or variability of the event is also important. Speaking in broad terms, attributes that are highly variable game-to-game, month-to-month or season-to-season could be influenced heavily by transient factors (e.g. luck), and therefore you may need to witness far more events before you can be sure of the actual underlying ability.

And no, the data does not necessarily have to be NHL data, even though NHL data is easier to get. We’ve come a long way in the fields of league equivalencies and player comparables, so depending what you’re looking at you can sometimes be better off by including non-NHL data in your analysis.

– Rob Vollman, Hockey Prospectus

It depends how accurately you want to try to assess a player’s talent. The more games, the more accurately you can assess things. The caveat is talent levels change over time. Players get better as they gain NHL experience. Players get worse as they suffer injury and age. We can never perfectly know a player’s talent level.

The number of games needed to assess a player also depends upon the consistency the player shows. It is easier to draw conclusions about a player if he consistently plays at the same level over time. If a player is not as consistent it takes longer to have the same certainty.

The model that I have in mind is that each player has a projection along with an uncertainty in that projection. The uncertainty is based in part upon the number of games a player has played, the circumstances of them and the consistency within. The accuracy in the assessment or uncertainty depends on too many factors other than games to answer this question with a number.

– Greg Ballentine, The Puck Stops Here at Kukla’s Korner

For most players with offensive talent, I’d say you need 150 junior games, or for high school players, maybe 40 college games:

http://www.hockey-reference.com/draft/NHL_2003_entry.html

There were maybe 80 draft-eligible players with the capacity for a real NHL career that season. Of the first 29 picks, 28 were good, without anyone seeing them play a single North American pro game. Obviously some of the picks weren’t perfect – Shea Weber falling to the second round? – so figuring out a player’s ceiling is a lot tougher, and probably takes three NHL seasons most of the time.

– Gabe Desjardins, behindthenet.ca and, of course, Behind the Net

This is a great question. Many times, people see a young player have a great season and they expect it to be a breakout, where he will continually put up those impressive numbers. This even happens on smaller scales, where a player will start off a season scoring goals at a quick rate.

As far as goal scoring talent, it’s pretty tough. Between five and seven hundred shots should establish a true shooting talent, while three years worth of information should be enough to establish the other talents – such as faceoffs, drawing penalties, assists, and others.

– Geoff Detweiler, Broad Street Hockey

I think the higher end talent needs much less time compared to middle and lower tiers. One lower-tier guy comes to mind immediately – the New Jersey Devils‘ Vladimir Zharkov. Zharkov’s first season was quite good by the possession metrics. Some of the Devils’ media took notice, but he was mostly considered a redundant fourth liner by the average Devils fan. The average NHL fan didn’t even know who he was.

Zharkov has no pedigree and very little offensive ability shows up in his formative years. His NHLE from his years in the KHL and AHL was weak. He comes to the NHL and dominates possession playing on the Devils’ fourth line against other fourth liners – is it likely that he’s a good player or given his background, was 2009-10 a fluke?

John MacLean thought it was a fluke and sent Zharkov to Albany to start the season. Four days after being named coach of the Devils, Jacques Lemaire called Zharkov back to New Jersey and put him back on the fourth line. As you can see from the Behind The Net player card linked above, he’s not been as good as he was last season, though his quality of competition has increased slightly, he’s not replicating the possession stats from his rookie season. We still have no idea which Zharkov is the real player, though by four-and-a-half seasons of non-NHL hockey, we can surmise that 09-10 was a fluke.

Why do I bring this up? Consider a guy like Steven Stamkos or Sidney Crosby. If they would have posted a season like Zharkov’s during their respective rookie seasons, we would have assumed those numbers to be some sort of line in the sand for established levels of possession. If they would have followed it up with a sophomore season similar to Zharkov’s current season, it would be considered a fluke because they’ve got the pedigree on their side.

– Derek Zona, The Copper & Blue

So we start with a few "it depends" kind of answers, and end with a rough 3-4 years if looking at a person's peak talent. In baseball, there's been a similar interest in at least 3 years' data, or roughly the point where a player is beyond both "adjusting to the league" and "the league adjusting to the player". In a lot of cases, the player is also up to their age 25 to age 29 seasons, in other words right at their peak age physically.

For the record, I'm a little bit sceptical when it comes to crediting experience outside of the NHL. Unless it's a league like the KHL, where a good number of players are capable of playing in the top 3 lines for an NHL team (or have actually played in the top 3 lines in the NHL), I don't really think the data from European and minor leagues can tell us much about the finer points of a player's game. And it's those finer points that can become glaring weaknesses at the highest level; there are a lot of guys that make it to the NHL on pure boxcars, only to spend most of their time with cush assignments and power-play time because they can't play both ends of the ice. Maybe, as Rob says, that will improve with better data collection, but I will still be pessimistic about what a performance against a lot of non-NHL talent will tell us. For the purposes of drafting, I'd be more comfortable with an NFL Combine-like testing program, on the premise that certain things can be taught, but the base skills need to be there first (plus we could use a more-rigid idea of what constitutes "speed" and "accuracy").

So where do you draw the line for data? I think most of us can agree that one year is a bit sparse, but what about two years' data? How about the use of non-NHL data? Do you think the NHL would benefit from a process similar to the NFL Combine? Do you still dream about Steve Mason?

Option	Votes
A big, juicy 3+ years (or ~700 shots, or ~200 games) of NHL data	33
An adequate 2 years (~400 shots, ~150 games) of NHL data, maybe 1 AHL season	26
A year of NHL, a dash of 2 or so AHL/KHL seasons	2
I’ll take the minor league sampler, 3+ years of small-fry action…that’ll do	0
Give me one year and let me blow ya mind	1

Talking Points