Reviewing NHL Officiating

VNamestnikov2

This past season, approximately around November or December, I began to take note of the officiating in the NHL games I was watching. Specifically, it was the worst I could recall watching in over 30 years. Now, granted, the memory is a tricky mistress and we are all sometimes subject to its illusory deceptions, but how on earth could I reconcile what I was seeing with my eyes with the niggling doubts of objectivity? Surely the officiating in the NHL wasn’t THAT bad, it was just that I was a long-suffering fan of a team that hasn’t been relevant since before Connor McDavid hit puberty.

So what to do about it.

Beginning in March, I set out on a project to track penalty infractions, missed and called, and categorize them. The idea was to figure out how many penalties weren’t being called and whether one team was being favoured more than another.

My approach was this: record and watch as many games as I could, reviewing the play and recording penalty infractions, missed or called, their type, severity and the player upon whom they had been committed (but NOT the player committing the infraction) and the time of the event.

So, for instance, if player #42 were driven into the boards heavily from behind but not to the extent of being absolutely crushed or risking severe injury, and no penalty were called, I would mark it as a Missed Infraction, Boarding, Type 2 (indicating that it was a reasonable infraction against the rules) and Physical (as opposed to technical fouls such as hooking, tripping, interference) and that it was committed against 42 and the time.

A typical few lines of entry would look like this:

Missed or Called Severity Type Team A Team B Time
M 2p Boarding 42 16:52
C 3t Hooking 18 14:11
M 3p Roughing 12 13:58

The severity of an infraction ranged from 0 (phantom call), 1 (marginal), 2 (fair or deserving to be called) 3 (obvious, apparent, blatant). The “t” and “p” refer to either a technical or physical foul, meant to help differentiate the calls and see whether teams were allowed to get away with more of one or the other.

The type of foul can be tricky at times, as some infractions cross over such as hooking and interference, so I relied heavily on watching and re-watching the play to determine the relative positions of the players involved, whether one, neither or both were in motion at the time, and so on.</p?

One column would list the players for one team, the other for the other team, and the times were most often the times at which the infraction occurred, but in the cases of penalties called, not always the time that the play was whistled and the penalty would take effect. It is a minor variance and one I was comfortable working within.

In addition, I re-read the NHL rulebook, familiarizing myself with the specific designations and criteria for all penalties with regards to actions required, body positioning, and the extent of authority at the referee’s discretion.

So now what on earth was I going to do with all this information?

To begin with, each game took between three and four hours to review, and this is without commercials, whistle breaks, and intermissions. It involved watching and re-watching plays sometimes as much as a dozen times to determine if an infraction did or did not occur.

What I was hoping to derive from this exercise was an idea of the relationship between the called and uncalled events of two teams.

For instance, how often do we hear about “make up calls” or the referees trying to even things up as though balancing a ledger book? The idea that penalties should naturally even out during a game has become so common place that fans tend to be incensed if one team has a noted advantage on the box score by evening’s end. But does that really reflect how the two teams are playing? If team A is running players into the boards, spearing stars in scrums after the whistle and targeting players with dangerous hits, while the other team is largely not engaging in those tactics, then ought the penalties at the end of the night be even?

I’m no starry-eyed child in the world. Justice is a human construct that exists almost nowhere in the natural world and even at that our own record as a species on the subject is sketchy at best. But generally in our societies we have at least established a level of acceptable injustice before we reach the point of outrage.

What I wanted to determine was “what is our level of acceptable injustice in the NHL?”

To do this I had to work assiduously at controlling anything that might speak to home town bias in my data collection. I began by distancing myself from the games I reviewed. I never watched the games live, and only looked at them after a week had passed so the result was in the books and several other games had occurred to help wash away any lingering emotions on that particular event.

Following that, once I had collected the data I began to compile the numbers and subtract events that were marked with a “1”, recall that was the “marginal” tag I had used. The purpose for this was twofold. First, it helped provide some sober analytical distance from watching the game and would help correct, if only slightly, for the home team bias. Second, it reinforced the expectation of a fan that hockey is a game of speed and contact and that a perfect game wasn’t one without a little hooking and physicality.

A game where every infraction, no matter how small, was called is not one I’d be particularly interested in watching.
There is one critical weakness to this project in that the sample size of games is not terribly large, fewer than 20 regular season games, but I believe it does provide enough of a base from which we can begin to ask further questions and perhaps (and at this time the very thought of it fills me with dread) build on it for next season.

So now I had my data, my method, and could work on figuring out what the numbers were telling me.

I could find out which players were most often targeted, either overall or by a specific type of infraction.

I could look at score effects to determine whether a team committed more fouls when they were leading, chasing or tied in the game.

I could challenge or confirm some assertions made by fans in regard to officiating and referee bias.

I could determine an average rate of a particular penalty being called, as well as a general rate of calls relative to the rate of missed calls.

Here are some of the questions I posed at the beginning of the exercise and will look to answer during the course of the following articles:

  • Who is the most targeted player(s)/is the player least likely to draw a penalty?
  • Are some players targeted by specific types of fouls? More than others?
  • Do some teams have a general tendency to commit one kind of foul over another?
  • What is the average ratio we can expect across the league of missed calls between any two teams on a given night?
  • What is the average ratio we can expect in games featuring the Oilers?
  • If penalties were called at an even pace relative to the team rather than trying to even them up between teams, how many powerplays might the Oilers expect to receive per game?
  • Given all of the above, how many penalty kills might the Oilers expect per game?
  • What is the likelihood of any particular foul being called? What is the rate between fouls?
  • What recommendations might we make to the Oilers or NHL based on the small window of data we have available on this topic?

I will be posting the information over the summer, but I will begin with the data I collected on the San Jose – Los Angeles first round playoff series for the sake of inserting some topical playoff data into the conversation and to help illustrate to Oilers fans what the officiating landscape might look like in the event they ever see the post-season again in our lifetimes.

I would like to point out that I am not looking to make this a public pillory of the NHL referees. This project highlighted to me the immense difficulty of calling a game at full speed on ice level. It was tremendously taxing doing it from the television camera angle with the benefit of immediate rewind and playback. What I am searching for, however, is a level of consistency and to determine if there exists any quantifiable evidence of a distinction between what two competing teams are capable of doing before being penalized.

So with that, I open it to you, the reader. Given my description of the data available, what questions regarding officiating would you like answered? You can ask in the comments section below or tweet me @CodexRex.

Arrow to top