The Blues and The Abstract Truth: Corsi, Facts vs. Norms

passiontackle
[youtube http://www.youtube.com/watch?v=RbaGDDbpcQ4&w=420&h=315]

I’ve had an ongoing argument with David Staples of The Edmonton Journal about Corsi stats. Staples is a skeptic about the value of the information Corsi offers, especially regarding an individual player. He prefers to use Nielson Numbers to judge a player’s on-ice contributions.

Corsi refers to the “total shot attempts” (shots on goal, blocked shots, missed shots) taken at even strength. It can be expressed in a couple of ways (as a percentage for each team/player, we typically call this “CorsiFor %”; as a rate per 60 minutes, 20 minutes, etc. for each team/player, we typically call this “CorsiOn”). And, it can be qualified in a number of ways (it can be expressed relative to own’s own team, or relative to the opposition one faces).

Neilson Numbers refer to “scoring chances” for and against. However, unlike a simple record of the percentage or rate of scoring chances for/against a team/player, Neilson Numbers add an element. They assign a “+” or a “–” to only those players on the ice who the recorder deems “contributed to” a scoring chance (they get a “+”) or “made a mistake leading to” a scoring chance (they get a “–”).

Aside from all the wrangling about Corsi vs. Neilson Numbers on a variety of levels (see Staples’ work on the subject here, here, here and here and see his most prominent critic on the matter, Tyler Dellow, here, here and here), my concern with Staples’ presentation of his analysis in this piece is rather narrow.

Staples treats Corsi events (shot attempts for or against a team/player) as either “plus marks” or “negative marks,” which are either “deserved” or not. Thus, regarding Justin Schultz, he writes:

The false positives and negatives of Corsi%
Just like in the official goals-plus minus system, where Schultz got a plus for every goal he was the ice for and a minus for every goal against he was on for, Corsi gave him a plus for every shot-at-net and a minus for every shot-at-net against, even if Schultz contributed in no way to that shot-at-net or made no mistake on the shot-at-net against.

In other words, he got a massive number of plus marks he didn’t deserve, as well as a massive number of negative marks he didn’t deserve.

But just how many false positives and negatives were there?

This is simply a category error.

On Category Errors in Analysis

The usefulness of any piece of information is highly dependant upon a correct understanding of the nature of that piece of information. A category error occurs when one assigns a quality or action to something it can’t possibly possess or perform. For example, if your hippy friend tells you his special rocks “speak to him,” he is making a category error of assigning speech to rocks.

A common category error lies in mistaking “facts” and “norms” (often this distinction is expressed as “is” and “ought” or “reality” and “ideality/normativity”). A “fact” is simply a piece of objective data expressed in a clear and veridical manner. It has nothing to do with whether an event “ought to” have occurred, or whether X person “deserves” praise or blame for the event. A fact is simply a record of what occurred.

We are familiar with the confusion of facts and norms from our everyday life. Take any political or morally charged conversation, say on television. Routinely, in these scenarios, cynical speakers switch from facts to norms in their own speech and in the interpretation of their interlocutor’s intentions. Thus, the factual statement, “it is raining,” made by X, might be re-purposed by a rival as “X thinks it is good that it is raining.” This is a category error insofar as it treats a statement of fact as if it were a normative statement.

In this situation, I am certain Staples is not cynical in his confusion, but merely mistaken.

A Corsi event is a factual record of what happened, i.e., “Justin Schultz was on the ice at even strength when a shot attempt was made.” This record has nothing to do with Schultz’ just deserts. Whether Schultz is to be praised or blamed for the event having taken place, is an entirely secondary concern.

That secondary concern involves a series of questions about how the factual record of Corsi stats are used by analysts, what claims those analysts make upon that record and what the historical record shows regarding the efficacy of the various claims made upon the factual record.

Staples has a variety of concerns about these secondary matters. While I would be happy to argue those secondary matters out with him in the future, first I believe we must get this primary matter sorted out.

The Nature of Corsi

The nature of Corsi, what it actually is, is a record of events. It is a category error to embed any normative analysis (i.e., who should be praised or blamed) into the record.

When Staples talks about a player’s “real Corsi%,” or when he says, “I’m finding 30+ per cent of all Corsi numbers are false positives/negatives. High margin for error” he is making a simple category error.

He is assigning a normative judgment (X deserved, or didn’t deserve, Y) to a factual record.

When he says, “Nobody observed Schultz actually involved in those Corsi plays. How many was he involved in? What per cent?,” or “No one went over video to assign actual plus and minus marks to Schultz on the 1036 shots-at-net for and 1381 shot-at-net against he was on the ice for at even strength in 2013-14,” he is making another category error.

He is interested in assigning praise or blame (+/-) for all hockey events for the sake of player evaluati
on. Let’s leave aside any questions about the value of this enterprise. My concern here is narrow. Staples’ interest in this enterprise appears to have clouded his understanding of Corsi records.

There is no such thing as a “real Corsi” that involves an analyst grading pluses and minuses. Staples is free to undertake such an enterprise (i.e., record all shot attempt data and charge +s and –s to the apparently responsible parties). But, we should be clear, such an enterprise would be entirely different from the nature of Corsi and therefore not Corsi at all but something else.

Once more, Corsi is a record of events (of this type: “Justin Schultz was on the ice at even strength when a shot attempt was made”). It is as indifferent to a player’s culpability for that event as a volcano is to the peasants it engulfs in molten lava.

So, when Staples says, “I’m uneasy about giving a player credit, or blaming him, for something he didn’t do,” he is showing a fundamental misunderstanding of the nature of Corsi.

This is further reflected in the statement, “Right now, we have inaccurate Corsi estimates for players.” Corsi records are simply the shot data compiled for every game by NHL.com in their event summaries. I asked Staples if he questioned the reliability of this data source and he replied that this was not his concern. Therefore, without a grave misunderstanding of the nature of Corsi records, I can see no reasonable way to come to Staples’ conclusion: “Right now, we have inaccurate Corsi estimates for players.” Corsi is not an estimate, but a record and Staples himself concedes that the record is accurate.

A Bad Analogy

When I suggested, as I have several times now, that Staples distinguish between Corsi records and the various interpretations of Corsi that he objects to, he offered a bad analogy to explain away his persistent category error. He argued, “by your standard, official goals plus-minus is also an empirical stat, but no right-minded person puts a lot of weight in it.” And, further wondered, “I’d like it if you also defended a similar stat to Corsi, NHL goals plus-minus. Same type of stat.

Now, Staples is certainly correct that the NHL’s traditional “+/–” stat is, like Corsi, a factual record of events. However, his thinking is gravely misled on a number of points.

1. I am not defending Corsi in my objections to Staples.

2. I am, therefore, certainly not defending Corsi in my objections to Staples on the grounds that it is an empirical record of events.

3. I am objecting to Staples’ category error, plain and simple.

4. A accurate record of events is not to be confused with how efficacious, or informative that record is, or how predicative it is of future events.

Let’s leave aside the fact that Corsi and traditional +/- are not “the same type of stat” for the moment (they aren’t, +/- involves a variety of game states––short-handed, empty net––that Corsi does not, for example).

The reason Staples’ analogy is poor is because it, first and foremost, misunderstands my objection. I am not upholding Corsi as a valuable stat in this conversation at all, let alone because it is an empirical record. This itself is a category error. I’m simply saying that empirical records don’t make judgments, analysts do. And, we should be clear about what it is we are taking issue with.

Another reason why Staples’ analogy is off the mark, lies in the equivalence it suggests all records of events apparently have. We know from our daily lives that this is not the case.

Take, for example, what I refer to lovingly as “trivial stats” (as opposed to “fancy stats” and “fancy narratives”). A trivial stat is a factual record of event, which is nonetheless a meaningless bit of fun with zero predicative value and is without consequence in the here and now. Trivial stats are extraordinarily common in commentary about sports. A trivial stat might be “X’s win/loss record pitching on Tuesdays,” or “Y’s only the 5th player to score a goal after eating a whole lemon poppyseed cake.”

These kinds of stats are empirical records of events. So are the NHL’s traditional +/- stats. So are Corsi and Goals Against Average and Fenwick and Runs Batted In and so on. These are all records of events. They, however, do not impose any burden on us to reward them with our belief in their value. Just as we are not expected to reward the statement “over the past five years, Tuesdays have enjoyed 20% more rainfall than every other day,” with any credulity simply because it may be true, I see no reason to follow Staples’ logic and reward +/-.

Staples’ Real Concern

Staples’ real concern lies not in the nature of Corsi, but rather in the interpretations offered by analysts who use Corsi to come to their conclusions. He has “a problem with the way Corsi is used to make judgements about players,” he tells me.

My concern, in this case at any rate, does not lie with Staples’ actual objections to the use of Corsi in making judgments about players.

My concern is that Staples has shown a repeated misunderstanding of the nature of Corsi itself, which makes it difficult to advance any further conversation about the utility of Corsi.

Corsi is not about granting a team or player his due. It is about what happened on the ice.

Analysts are free to take this information, qualify it in any manner of ways (say by zone starts, quality of teammates, quality of competition, sample size, with or without you information, relative to team Corsi, etc.) and offer their interpretation up for public consumption and debate. They are not, however, free to make category errors about the nature of the information itself.

[Note: to some non-trivial degree this piece resonates with this article by Cam Charron, Ratings and Records and Misconceptions about Analytics] [adsanity id=1808 align=alignnone /]

 

Arrow to top