Zooming out as I’m apt to do, another quick/reminder exploration of the various algorithms. This is restricted entirely to the graph methods.
The goal here has always been to create a pecking order of the teams, based off of wins and losses and nothing else. This means a DAG – a directed acyclic graph. Acyclic means no beatloops.
Beatloops exist and there’s no way around that. The goal is to remove them. The question was what a beatloop means. Does it mean the teams in a beatloop are tied? The thought here was to reject that – a beatloop doesn’t mean that the teams are tied; it just means that the relationship between the teams is ambiguous.
This frees us up a bit – all we have to do is remove ambiguity and just rely on the rest of the graph to sort itself out into a rough pecking order.
We want to retain clear win/loss relationships and reject ambiguous data. Which means rejecting the most “clearly ambiguous” data. To me, this has always meant, the smallest set of data. I’ve had it linked in my head that the smallest set of ambiguous data by definition meant the most tightly linked set of ambiguity. It’s clear that in the NFL, this is not necessarily true. I think it’s true that a four-team beatloop can be more tightly ambiguous than a three-team beatloop. But more on that later.
But I think the general principle is to remove the smallest amount of ambiguity, in order to retain other clear relationships.
So one approach was just always to resolve the smaller beatloops first. At first, this makes sense – two-team-beatloops (home-and-home season series splits) mean that it’s hard to say which team is better than the other just based off of the wins between the two teams. That’s intuitive. That’s a clear example of “tight ambiguity”. It’s an easy decision to remove season splits. Those links in there can lead to an absolute ton of beatloops, and it was gratifying to see how much simpler the graph got after removing even only the season splits.
Next was again the question of what is the best way to fairly remove the smallest set of ambiguous data? My approach here was – when there’s a question, punt – as a way to avoid as much subjectivity as possible. (Leaving aside the semantic discussion of how even choosing to create this website was an exercise in subjectivity. 🙂 )
Punting meant – rather than trying to determine how to split apart the collection of three-team beatloops into several families, to just remove all of them at once, no matter how much they overlapped.
This has been the base system from the beginning. I like it because it’s simple enough that I believe anyone can grasp the approach without forgetting the details. Map all the victories. Remove the season splits. Remove the beatloops, smallest first. Done.
So what are the subjective choices I drew here? First, there was the desire to actually end up with a DAG (direct acyclic graph). We can even challenge that one and just choose to determine rankings based off of a graph with loops in it, but I want to avoid that direction for now. It’s not visual enough and doesn’t strike me as intuitive to a casual graph viewer.
So the first subjective choice was to remove smallest loops first. We already have one open challenge there – Boga sketched out a quick algorithm that I’d like to explore (if I can just find the comment again). That would be to work on resolving all loops at once, I think (or it might have just been how to rank teams before the loops are taken out).
And I’m also pretty convinced at this point that resolving smallest loops first is more sound only if the teams all play each other, as they do in the NBA and MLB. In the NFL, a four-team beatloop can exist and it doesn’t necessarily mean it shouldn’t be removed at the same time as some other three-team beatloops, and this is specifically because of the rarity of intra-conference games, which is not a team’s fault.
The other subjective choice was to just remove all n-sized beatloops at once, no matter how much they overlapped. The thought here was to avoid the subjectivity of choosing to remove 3-team beatloops before removing other 3-team beatloops. But in a sense I am already introducing a similar subjectivity by removing 3-team beatloops before removing 4-team beatloops.
So while one direction is resolving all beatloops at once, the other direction is to choose to split out (e.g.) 3-team beatloops into multiple families, and resolve some of them first, and then the others.
There are some easy ways to chip away at this that would be noncontroversial. The problem is not a team being in several beatloops, it’s of a single segment (TeamA=>TeamB) being in several beatloops. So you could first remove every beatloop where none of its single segments are in other beatloops. If I were to do this and then resolve the remaining beatloops with current rules, I’m pretty sure I’d end up with the exact same result. But it could introduce some greater clarity in terms of examining each step of the process. There is room to explore in terms of choosing how to remove some beatloops with overlapping segments, before removing other ones.
Iterative is an example of an approach that seeks to remove segments, not loops – it identifies actual single-match outcomes that should be ignored in the initial data set, as opposed to entire loops. Doktarr, please correct me if I’m wrong on this. We’ve had one other similar approach of ignoring actual segments; the beatfluke approach, which obliterates beatloop segments that are directly contradicted by the resultant DAG, thereby restoring the rest of the beatloop to the graph. There is one other obvious segment-removal approach we haven’t explored yet, and that is the one that merely seeks to find the fewest number of matches in a season that can be excluded in order to create a DAG. (For ties, for instance, five ways to remove only seven game outcomes, we’d isolate down by removing the game outcomes that appear in all scenarios, and then rank the rest of the candidates through some set of objective-as-possible standards.) Doktarr has been very patient with my reticence to get back into segment-removal. 🙂
So I’ve got a couple of questions here.
1) Doktarr, is it possible to restate iterative logically, in a way that doesn’t use ratios or fractional numbers? Is it as simple as just removing NYJ->MIA first because that link appears “most” (5) in all the 3-team beatloops? I think someone actually did restate it this way once, but I might be confusing it with Boga and MOOSE’s discussion.
2) Also, is it possible to restate iterative in terms of full beatloops being removed?
3) Finally, more thoughts from anyone appreciated on how to judge removing beatloops if multiple sizes of beatloops are considered at once. There’s obviously a problem with this, in that by the time every team has at least one win and one loss, the entire league is one big beatloop. The only thing that we can REALLY say at this point is that Detroit is definitely worse than every team they have directly played. But every other team can draw a circuitous beatpath to any team that has beaten them.