Zooming out as I’m apt to do, another quick/reminder exploration of the various algorithms. This is restricted entirely to the graph methods.

The goal here has always been to create a pecking order of the teams, based off of wins and losses and nothing else. This means a DAG – a directed acyclic graph. Acyclic means no beatloops.

Beatloops exist and there’s no way around that. The goal is to remove them. The question was what a beatloop means. Does it mean the teams in a beatloop are tied? The thought here was to reject that – a beatloop doesn’t mean that the teams are tied; it just means that the relationship between the teams is ambiguous.

This frees us up a bit – all we have to do is remove ambiguity and just rely on the rest of the graph to sort itself out into a rough pecking order.

We want to retain clear win/loss relationships and reject ambiguous data. Which means rejecting the most “clearly ambiguous” data. To me, this has always meant, the smallest set of data. I’ve had it linked in my head that the smallest set of ambiguous data by definition meant the most tightly linked set of ambiguity. It’s clear that in the NFL, this is not necessarily true. I think it’s true that a four-team beatloop can be more tightly ambiguous than a three-team beatloop. But more on that later.

But I think the general principle is to remove the smallest amount of ambiguity, in order to retain other clear relationships.

So one approach was just always to resolve the smaller beatloops first. At first, this makes sense – two-team-beatloops (home-and-home season series splits) mean that it’s hard to say which team is better than the other just based off of the wins between the two teams. That’s intuitive. That’s a clear example of “tight ambiguity”. It’s an easy decision to remove season splits. Those links in there can lead to an absolute ton of beatloops, and it was gratifying to see how much simpler the graph got after removing even only the season splits.

Next was again the question of what is the best way to fairly remove the smallest set of ambiguous data? My approach here was – when there’s a question, punt – as a way to avoid as much subjectivity as possible. (Leaving aside the semantic discussion of how even choosing to create this website was an exercise in subjectivity. ðŸ™‚ )

Punting meant – rather than trying to determine how to split apart the collection of three-team beatloops into several families, to just remove all of them at once, no matter how much they overlapped.

This has been the base system from the beginning. I like it because it’s simple enough that I believe anyone can grasp the approach without forgetting the details. Map all the victories. Remove the season splits. Remove the beatloops, smallest first. Done.

So what are the subjective choices I drew here? First, there was the desire to actually end up with a DAG (direct acyclic graph). We can even challenge that one and just choose to determine rankings based off of a graph with loops in it, but I want to avoid that direction for now. It’s not visual enough and doesn’t strike me as intuitive to a casual graph viewer.

So the first subjective choice was to remove smallest loops first. We already have one open challenge there – Boga sketched out a quick algorithm that I’d like to explore (if I can just find the comment again). That would be to work on resolving all loops at once, I think (or it might have just been how to rank teams before the loops are taken out).

And I’m also pretty convinced at this point that resolving smallest loops first is more sound only if the teams all play each other, as they do in the NBA and MLB. In the NFL, a four-team beatloop can exist and it doesn’t necessarily mean it shouldn’t be removed at the same time as some other three-team beatloops, and this is specifically because of the rarity of intra-conference games, which is not a team’s fault.

The other subjective choice was to just remove **all** n-sized beatloops at once, no matter how much they overlapped. The thought here was to avoid the subjectivity of choosing to remove 3-team beatloops before removing other 3-team beatloops. But in a sense I am already introducing a similar subjectivity by removing 3-team beatloops before removing 4-team beatloops.

So while one direction is resolving all beatloops at once, the other direction is to choose to split out (e.g.) 3-team beatloops into multiple families, and resolve some of them first, and then the others.

There are some easy ways to chip away at this that would be noncontroversial. The problem is not a team being in several beatloops, it’s of a single segment (TeamA=>TeamB) being in several beatloops. So you could first remove every beatloop where none of its single segments are in other beatloops. If I were to do this and then resolve the remaining beatloops with current rules, I’m pretty sure I’d end up with the exact same result. But it could introduce some greater clarity in terms of examining each step of the process. There is room to explore in terms of choosing how to remove some beatloops with overlapping segments, before removing other ones.

Iterative is an example of an approach that seeks to remove segments, not loops – it identifies actual single-match outcomes that should be ignored in the initial data set, as opposed to entire loops. Doktarr, please correct me if I’m wrong on this. We’ve had one other similar approach of ignoring actual segments; the beatfluke approach, which obliterates beatloop segments that are directly contradicted by the resultant DAG, thereby restoring the rest of the beatloop to the graph. There is one other obvious segment-removal approach we haven’t explored yet, and that is the one that merely seeks to find the fewest number of matches in a season that can be excluded in order to create a DAG. (For ties, for instance, five ways to remove only seven game outcomes, we’d isolate down by removing the game outcomes that appear in all scenarios, and then rank the rest of the candidates through some set of objective-as-possible standards.) Doktarr has been very patient with my reticence to get back into segment-removal. ðŸ™‚

So I’ve got a couple of questions here.

1) Doktarr, is it possible to restate iterative logically, in a way that doesn’t use ratios or fractional numbers? Is it as simple as just removing NYJ->MIA first because that link appears “most” (5) in all the 3-team beatloops? I think someone actually did restate it this way once, but I might be confusing it with Boga and MOOSE’s discussion.

2) Also, is it possible to restate iterative in terms of full beatloops being removed?

3) Finally, more thoughts from anyone appreciated on how to judge removing beatloops if multiple sizes of beatloops are considered at once. There’s obviously a problem with this, in that by the time every team has at least one win and one loss, the entire league is one big beatloop. The only thing that we can REALLY say at this point is that Detroit is definitely worse than every team they have directly played. But every other team can draw a circuitous beatpath to any team that has beaten them.

I’ve been out the country for the last few weeks, just getting caught up with this stuff. Before I left, I had thought that I finished up my code to look at removing beatloops. But of course, by week 6, the data found a way to do something I didn’t anticipate. I should have the kinks worked out by the end of the break.

As to your point 1), that is what I was proposing, except merging it with point 3) as to look at every beatloop. I was thinking of calling it the counting method. I have a few early comments about week 5, and hopefully I will have a post soon with pretty pictures and current results comparing the methods.

There were 252 unique beatloops in week 5, comprising of 37 different beatwins (out of roughly 80 beatwins). Surprisingly, only about 7 or 9 teams would lose every win and loss. I’m sure this would increase to every team within a few weeks.

The counting method would look at every beatwin in these loops and eliminate the one occurring the most. KC->Den appeared 223 times, so they are broken first. Then out of the remaining 29 beatloops, Ind->Min occured the most. And so on. In the end, 7 beatwins were removed.

The standard method removed 18 beatwins, and the iterative removed 14 or 15ish (just by eyeballing the graph). I am assuming that this gap will widen as the weeks pass, we’ll see.

Logically, I can’t see another way actually being able to remove less loops than what I proposed (which doesn’t prove there isn’t one).

Which brings us to the subjectiveness, as removing the least (or retaining the most data) doesn’t necessary mean its the “best” data. I’m sure we’ll have some good discussion over that.

Boga

http://www.beatgraphs.com/BreakingBeatLoops.php

For an explanation of how the Iterative Method works.

Moose’s explanation is really great. Two thumbs up.

To answer the questions directly:

1) In the case of the first link removed, yes, it’s because it was in the most loops (unless it’s a double strength win from a season sweep). After the first link is removed, though, it’s no longer that simple, because a link that was weakened in a previous iteration may get removed before one that is in more loops.

2) Iterative DOES remove full loops. That’s all it does. It just removes them in fractional amounts, until one of the links in the loop disappears.

Boga’s method is interesting, for sure. I would argue that you probably shouldn’t recompute the beatloop counts after removing the most frequent win, as this seems like it could introduce some bias. Another major issue here is that there’s probably billions of unique beatloops in the data at this point. It’s an exponentially growing set.

Just to reiterate (ha ha), or maybe restate, the #1 thing I am stuck on regarding removing segments is that it feels like it is putting a value judgment on certain victories as being invalid.

Removing an entire beatloop is defensible because we’re merely removing ambiguity. But removing a segment is like saying that win is more fluky than the other wins in that beatloop.

And in a situation like NYJ->MIA, the truth is that the victory is just causing a lot of problems due to scheduling. But the schedule creating a lot of problems centered around that game doesn’t mean that the game itself is by definition more fluky than other games.

So it’s like we’re saying, “This is causing problems, therefore, there isn’t REALLY any evidence that the NY Jets are better than Miami. They didn’t REALLY beat Miami.” When there’s actually no reason to think that. Removing entire beatloops are fine, because in exchange for ignoring evidence that Team A is better than Team B, we’re also ignoring evidence that Team B is better than Team A.

So that’s why I philosophically disagree with the segment removal methods. (Doktarr I see your point about it incrementally removing beatloops, but the end result after beatloops are resolved is that even partial-strength beatwins are seen as full beatwins again, so the end result is that it’s a segment remove method.)

In contrast, beatflukes was somewhat defensible to me because at least then there was other independent graph data (entirely different beatpaths) to “prove” that Team B was better than Team A.

It’s a segment removal method in the sense of what the final graph looks like, but the weights left can be used in ranking, which is what Moose does.

Iterative *is* considering independent graph data (entirely different beatpaths). If A->B->C->A is true, and A->B->D->A, then that’s two different independent paths (B->D->A and B->C->A) that demonstrate team B is better than team A. I don’t see the distinction you are drawing.

I mean an independent beatpath outside of any of the beatloops being considered. And actually, I think the point still remains, because you’re still making certain wins – certain evidence that one team is better than the other – less relevant, but only because they’re causing more problems in the graph, not because there’s actual reason to see the victory itself as fluky.

Just to be clear, I’m not saying iterative is inferior or any of that, I think it’s a great variation and I love thinking about it and considering it. I’m just bringing it up as additional reason why I want to also continue to consider approaches that continue to remove full entire beatloops, specifically because that respects the idea of removing ambiguity rather than making value judgments on the flukiness of certain victories.

It’s kind of a basic principle – I think that if we were to clearly adopt the principle of judging what victories look fluky – something that the parallel system doesn’t do at all – then it could also open the door to all kinds of additional good thinking. My only point is that it’s of a different philosophical family. I’m just trying to remain clear about the different philosophical approaches.

Excellent point TT. I think we can all agree that if we do NOT remove any beatloops, the graphs get pretty worthless. However, I feel that removing an entire beatloop at once is the same argument you are making against Doktarr and I but on a larger scale. Say we have two beatloops,

A-B-C-A and A-B-C-D-A

You choose to remove the smaller one, thus leaving D-A. Why not remove the larger one? Both C and D beat A. And C also beat D. It seems like one is saying, I have no idea how A,B,C fit into the graph, but I know D is better than A. It seems a pretty arbitrary decision, other than wanting to retain the most data. And if retaining the most data is the goal, then looking at the individual beatwins within loops would solve that. Let’s take it to the extreme.

Say every NFL team played every other NFL team once. And going into the last week, the mighty broncos have won every single game. And the lowly lions have lost every game (oh wait, they have). They are playing each other in the final week and the lions win. This would create a three team beatloop with every team. The standard method would then eliminate DET only beatwin, and every single one of DEN’s beatwins.

Ask any random person, and they would all agree that DEN is one of the best teams that season (they might not agree they are the best, but they would all agree they are easily in the top teams). However, the graph would show DEN all alone, saying that there is no way to figure out where they are. This doesn’t pass the smell test to me.

However, looking at the individual beatwins (either Doktarr or mine), it would agree to eliminate the DET-DEN beatwin. I really think its the way the beatpaths philosophy is trying to tell us something. That one beatwin would cause an enormous amount of beatloops (3,4,5 teams etc beatloops). The algorithm itself is saying, “Look, I looked at all this data you gave me, and geez this one game does not fit at all with all the other data presented”

If we only compared the two individual beatwins of DET-DEN and DEN-whoever all by themselves, sure, there isn’t any sound judgement into why eliminate one more than the other. But then if you look at all the other hundreds of beatwins also in the dataset, DET-DEN really disagrees with everything else that is there.

By eliminating the most common beatwins, we are in effect removing the data that disagrees with the whole set the most. And in removing the ones that disagree the most, we get closer to the general consensus that the beatgraphs is trying to show must faster, and in my mind, more accurately.

Boga

And a comment about beatflukes. From the example back in 2005 thankgiving, where you explain beatflukes, say we have these two beatloops: (Which are the same loops you used, I’m too lazy to write out the full teams)

A-B-C-A and A-D-E-F-C-A

In the standard method, we wouldn’t ever “see” the second beatloop, because it is part of a smaller beatloop. Then we would remove the A-B, B-C, and C-A beatloops. The one to the right would then be reduced to

A-D-E-F-C. And then we would have A better than C, even though we eliminated C-A directly. You would then say, because of this, we should break the beatloop at C-A and retain the data.

This is almost (if not) EXACTLY what my method is doing. Am I wrong here?

Boga

I’m certain you are all well aware that what you are doing is a topological sort of a cyclic graph. When I was taught this algorithm many years ago, the assumption was that once you hit a cycle, you could pick any vertex (i.e. a random one) in the cycle and continue the process on. I believe that gives one an upper bound on the number of different orderings possible, for a 32 team cycle and 16 game season, 32 it would max out at 32 choose 16. Of course, you are trying to do better than that–picking a vertex that isn’t random, but has some basis for being chosen first, and that’s very laudable. Your work actually has implications outside of simply ranking sports teams. For example, one of the methods tried, the weighted method, attempts to use the strength of the victory (expressed as a point difference) to determine which edges are weakest and should be broken first. This is attempting to apply more information than just the W/L record to the graph. there are other domains where weighted graphs are used, such as shortest path and max flow algorithms.

With this in mind, I will submit some follow on messages, with some specific suggestions and comments.

Thanks for that insight Chris. I’ll be very excited to read your additional comments. I’ve been doing a little research into “small world network” theory to see how that might affect graphing–given that not all NFL teams play each other, but instead play only a certain subset of opponents. But it seems like you have some actual education in this topic, and perhaps you can shed some light on various models.

To start with, I posted a query on comp.theory to see if this problem has been studied before or has connections to other known problems. Although I have some background in the theory, I am by no means an expert, but hopefully I may be able to find more expertise to apply to the problem. It seems like the kind of problem that someone must have attacked before. It looks like it might be called the minimum feedback arc problem.

I can also see how this can be mapped to a traditional statistics problem, where we have sampled some data and we want to infer the probable measurements of the population. One can get quite accurate estimates with fairly small samples, which is why people take exit polls to predict general elections. With 16 sets of 16 games, we have 256 different games worth of information to draw on, that may not be that small of set.

Now, back to what I know about this problem. So, if we look at this as a traditional topological sort problem. One can pull off the acyclic ends of the set, the teams with either no wins or no losses to rank the very best and very worst teams (or at least the teams that are probably the best and the worst). All the algorithms suggested do that. However, we eventually get to the cyclic part of the graph, and those cycles have to be broken.

One thing I think is commonly done is to “merge” the cycles and treat them as single elements. Let’s say 3 teams form a cycle–a beatloop, A-B-C, instead of removing the loop from the graph, replace the 3 teams in the loop, with 1 team, “ABC” and take the wins from each of the individual teams and attribute it to the group. I don’t know if this will have a different effect than removing the loop, but it might. Merging the teams into 1 entity, preserves the ambiguity rather than removes it.

The next thought, is that the problem also has a “dual”. A lot of the discussion has been about removing fluke games and how to determine if a game is a fluke. However, you can break cycles another way too, by removing inconsistent teams. If some team, say the Broncos or the Jets, has a lot of games that are involved in loops, if you remove the team from the graph, all those loops disappear. For example, with two loops A-B-C[-A] and A-D-E[-A], if one removes team A, both loops break. It might be interesting to see what is the smallest set(s) of teams to remove to break all the loops. Then, you might break the loops involving those teams first. It is probably worth experimenting with breaking whole loops and with breaking only the games from the inconsistent teams.

For example, consider the following graph, A-B-C[-A], A-D-E[-A], B-D, C-E. If you entirely remove the loops involving A, you only have the B-D and C-E results left. However, if you just discard the A games, one is left with B-C, B-D, C-E, and D-E games, suggest a ranking of B first, E last with C and D ambiguous in the middle.

Chris, as far as I can tell, the latter technique you’re describing is what the “iterative method” that Moose and doktarr frequently discuss attempts to do — remove the most ‘flukey’ links in the set of beatloops.

http://www.beatgraphs.com/BreakingBeatLoops.php

Merging is something we tried here when the Eagles and Bengals tied, but I’m not sure people would be happy about merging larger loops of teams.