Miscellaneous observations:

Last week’s power rankings are ~~10-5~~ 9-6 so far this week.

The only two teams with beatpaths to every other team in their division: Denver and Arizona.

#31 beat #1, with some awesome direct snap misdirection.

And it’s time to start thinking of tiebreaker and rank strategies.

There are two stages to determining rankings. First is the beatloop resolution strategy. That is pretty stable, although doktarr and moose have written about possible ways to enhance it. The general principle of beatloops is not to imply that the teams in a beatloop are tied – it’s more just that it is the smallest set of data that can be seen as ambiguous/confusing, and thus should be removed. That way we rely on the rest of the graph to imply rankings. I think that trying to divine too much data from a beatloop just introduces too many judgment calls into a graph. We always remove smallest beatloops first, starting with splits, and then recalculate.

We’ve tried some methods to bust beatloops here in the past. One that I was fond of was called the beatfluke method, defined as: If Team A’s loss to Team B was beatlooped away, and Team A also has an entirely different remaining alternate beatpath to Team B, then it contradicts Team A’s loss, and thus the A-beats-C-beats-B part of the beatloop can be restored to the graph.

I found this made the graph more vertical, and also slightly more accurate, but I didn’t like how it would lead to more dramatic shifts in the power rankings each week. It made the graphs vary more from week to week. Perhaps if it were combined with a more stabilizing tiebreaker, it could be used again.

The other two approaches of busting beatloops were doktarr’s “iterative” method, and Moose’s score method. The “iterative” method breaks shared beatloops at their shared link, like if one game is responsible for the existence of several beatloops. It is another effort to try to identify one link of a beatloop (a game outcome) as flukey. I do have trouble justifying that one intuitively, though – I feel like I need another reason to believe that link actually is flukey, other than it just being part of several beatloops. The other is a weighted system having to do with score differentials. I believe this ended up accurate and perhaps superior, although I’m trying to keep the main system here free of extra data like points (as opposed to just wins and losses).

After that, there’s how to determine rankings from the resultant beatpath graph. So far this season, I’ve been breaking ties based off of the rankings of the previous week. But the usual tiebreaker for later in the season is to compare the strength of the teams’ direct beatwins – for instance, if every team in a tied set has at least three beatwins, it averages the strength of the top three beatwins of each of those teams, and picks the top team. Finally, I think Moose came up with a tiebreaker having to do with counting all the links in a resultant beatpath graph. This is somewhat similar to what I used in the first and second year here, which counted number of teams above and below each team, but it yields more information in that it counts every link of every possible path, thereby giving extra weight to stronger paths. I probably have this explanation wrong but Moose will correct me in the comments. This is a good candidate to apply as a tiebreaker to the official rankings this season.

I love the Beatpaths site… please check your email.

As far as I can tell the only beatloops are NE->NYJ->MIA->NE and CAR->CHI->IND->MIN->CAR. Since they are non-intersecting, my beatloop removal approach would give the same results as yours.

The way I would explain the iterative approach is that it is trying to remove the minimum amount of information from the system to give us non-contradictory results. I dount that works better for you on a gut level than the way you explained it, but there it is.

All the possible “tiebreakers” work for me. One thought is that you could average in last week’s ranking into another tiebreaker method for the next few weeks, and move over to a pure year 2008 tiebreaker once you have more data.

I’m pretty sure through last season we mostly didn’t like the weighted method because one big victory could throw the chart off permanently. Detroit ended up relatively high last year on the weighted graph because of one blowout, and some other teams were jumbled in a way that most people would have disagreed with. The only place where the weighted system had merit was determining the winner of the Super Bowl when it felt one team was significantly stronger than the other. This is too small of an area to consider the entire method superior. I run the weighted graph mostly because it’s another perspective, but if we come up with another way to resolve loops, this will be the one to get dropped.

I also agree that what we’re trying to do here is to find a way to make the best relational graph with the least amount of data, so including score goes against that goal.

With regards to my ranking system, it isn’t so much of a “tiebreaker” as the scores themselves determine the ranks. You essentially describe it correctly though. Each team has all possible paths measured coming in and going out. Every link above them costs a point, every link below gains a point. So for the NYG->WAS->NO->TB->ATL->DET path from last week, NO would get 3 points for the paths below them but lose 2 for the ones above for a net of +1 point. It gets a lot more complicated when branches are involved, but it works out. After the point totals for each team is determined, I run the teams through a normalizing formula to put them all on a scale from -10 to 10 where 0 means the team has the same number of paths going in and out. A high score not only represents being high on the graph, but also having a breadth of teams below and several direct wins to high ranking teams.

In terms of the Iterative method, essentially it gives you the taller graphs that you like seeing from the BeatFluke method, but with more stability from week to week.

For my part, I’d rather do away with any reference to last year’s information as soon as possible. Now that Week 3 is over, my rankings will no longer use last season for a tiebreaker.

The weighted method this week breaks the NYJ/MIA/NE loop to be MIA->NE->NYJ because the NYJ win over MIA was the closest. The other loop gets broken in two places because IND->MIN and CAR->CHI were both 3-point games. As the graph stands right now, a SD win over NYJ tonight won’t change anything but will give SD a direct arrow to NYJ. If NYJ wins, the graph will get a lot taller.

In the Standard and Iterative methods though, NYJ is currently detached from the graph due to the loop it is in, so tonight’s game will simply decide if they appear above or below SD.

I like your explanation better doktarr. Similar to how we were talking last season, finding the power rankings that contradict the fewest game results. Although it occurred to me that just because one power ranking has fewer ignored victories than another, doesn’t mean that it’s more accurate.

I like the iterative approach but I have trouble reconciling it with the principle of punting on ambiguity and using the non-ambiguous parts of the graph to pick up the slack. Like, within a beatloop, it probably isn’t true that the teams are “tied” in quality – but, that it is probably more informative to use information outside of the beatloop to determine which of the teams are stronger, than it is to use information inside of the beatloop. Because if you use information inside of the beatloop to break the tie, then another team in the beatloop is always going to have a good reason to disagree.

This goes back to when I was playing around a lot with Condorcet voting. People would always talk about how the likelihood of Schwartz sets and Smith sets were evidence of a “flaw” in the voting system, because any system to tiebreak one of those sets would screw one of the candidates. When the truth was, the voting population really just did have ambiguous preferences of who they preferred in the resultant Schwartz/Smith set, so the correct approach in those case would have been to work out either a power-sharing agreement, or reduce the collection of candidates down to those in the set, and then stage another vote later (after collecting more data or giving voters more time to consider).