New Power Ranking Method

I think we’re ready to transition from a “beta” power ranking to a “release candidate” power ranking.

In the beta series, the power ranking stuck to a “King Of The Hill” approach, where it was difficult for a team to sink in the rankings unless another team developed a beatpath over them. All in all, it was a pretty good method, but it wasn’t very responsive, and it made it very difficult for good teams to rise. Plus, there were some flaws, as a couple of commenters pointed out – if a team developed a beatpath over a team early on and then lost it, it was still more difficult for the other team to rise above it.

But really the biggest issue is that it just left a lot of beatpath data on the table. It was paying attention to beatwins, but it wasn’t really paying attention to beatlosses or beatloops.

It also helped to hone in on exactly how any beatpaths-related power ranking system should work, though. This is really the simple explanation of how the power rankings are figured:

  1. Find all teams with no beatlosses
  2. Choose the best of these teams with a tiebreaker, and append to the rankings
  3. Remove the team and all its direct beatwin arrows from the graph

That very simple routine will guarantee a power ranking consistent with all the beatpaths.

So the question is what tiebreaker to apply to step #2. (Ed note: I’m now using “BeatPower”, which is better explained here. But read on for a more wordy explanation.) Originally I chose the team with the longest beatpath segment, frequently using the previous week’s rankings to break ties. Other options were to take the team with the highest ranking the previous week, or looking at the team’s beatloops, or anything else.

But the principles of the method are to rely as much as possible on the entire beatpath graph. And so after pondering that for a while, I figured out a way to factor in a team’s beatlosses, beatloops, and beatwins. In some ways it is similar to the beatpoints concept I wrote about a few entries back, and that has been reported in the power rankings, although it doesn’t really have a mathematical relationship.

The tiebreaker is similar to a win/loss record. You add up the number of beatwins, beatlosses, and unique beatloop teams (including the team itself) for each team. That’s the total number of relationships that team has. Then the formula becomes (wins/total – losses/total). The number can range from 1 to -1.

Here are the first few teams in the power rankings for Week 8 of the 2005 NFL season:
IND: 17/17 – 0/17 = 1
DEN: 22/27 – 0/27 = 0.815
SEA: 12/19 – 0/19 = 0.632
NYG: 8/13 – 0/13 = 0.615
PHI: 15/23 – 1/23 = 0.609


HOU: 2/21 – 19/21 = -0.809
CLE: 1/25 – 20/25 = -0.759
GB: 0/30 – 27/30 = -0.899

The end result is that a team with a lot of beatloops will usually be in the middle of the rankings. A team that obliterates its beatloops without developing new beatpaths will still rise in the rankings. A team that wins, but finds its downstream beatpaths broken apart could find itself falling in the rankings.

Overall, the power rankings should be a lot more responsive, and almost entirely related to the beatpath graph of each week. The only time the previous week’s rankings would be looked at is if multiple teams end up with the same point score, but that’s rare in the case of this method. Also, any initial subjectivity in the first “seed” rankings get bled out quickly.

Here’s the short list of of the Beatpaths 2005 NFL Week 8 Power Rankings using this method:

IND, DEN, SEA, NYG, PHI, JAC, DAL, WAS, ATL, NE, PIT, SD, CAR, CIN, STL, KC, ARI, OAK, SF, TB, CHI, BUF, NYJ, NO, MIA, DET, TEN, MIN, BAL, HOU, CLE, GB

4 Responses to New Power Ranking Method

  1. Paul says:

    I like your methods. As a method for looking at the stability of your rankings, have you considered ‘flipping’ your procedure? What I mean is that, if instead of beatpaths, you look at ‘losspaths’ Do everything the same, but flip losses and wins. The graphs and loops will be identical, just flipped. Then when you perform you rankings, you should get teams in order of worst to first. Because you’ll remove the worst team instead of the best team, the rankings may not be exactly the reverse. How different they are will give you some sense of the stability of your rankings.

  2. ThunderThumbs says:

    Looks like there are some differences – every few weeks there’s a case where three teams right next to each other are in a different order. Sometimes it happens twice. Last time it happened in 2005 was in week 7, but both week 8 and week 9 were exactly the same. It’s interesting. I can kind of visualize how it happens. If you look at all the teams with no beatwins all at once and compare them, then you’re looking at a collection of teams that you’d probably never look at all at once if you were going in the order of beatwins. I think there’s a danger of this happening whenever you see two teams in the power rankings where the BeatPower points appear out of order. (See Miami and the Jets in Week 9.) And the reason that BeatPower points can appear out of order is because a team might have many more beatloops than another team. Beatloops pull a team towards the center of the rankings, which I think is appropriate. It can mess up the symmetry because if it’s a high-ranked team it will pull the team down, whereas if it’s a low-ranked team it will push them up, and yet the same beatloops will show up whether the method is flipped or not. So I think that explains the lack of stability.

  3. Paul says:

    Sounds like it is a fairly stable algorithm then. And since you agree with the results of beatloops moving teams towards the middle, your algorithm is finished or only requires minor tweaks. Congrats.

  4. ThunderThumbs says:

    Thanks. The main things I find myself considering now are issues like these:

    1) Is it really appropriate for the team itself to be counted as one of the beatloop teams? My best “yes” answer so far is cynical; it appears to make the rankings less volatile.

    2) If a team has a beatloop with another team, yet also has a differently-routed beatpath to that team, the team gets counted in both places. Appropriate? So far, I think so, even though it means the denominator could be greater than 32.

    3) Pat commented on the possibility of handling a situation differently if a beatloops shares an edge (arrow) with another beatloop. I’m still pondering that one.

    I’m basically still rooting around trying to find logical justifications for all three.

    The last niggling thing is that technically, teams have a far more stable (reinforced) beatpath to a team below them if they have also beaten them directly. Right now I just delete those redundant relationships. But it suggests the possibility of a much more complicated but perhaps more stable ranking method, that has to do with how easily a team would be able to shake off a beatpath from any other team. Earlier this season we showed that Houston would have had to have beaten Indianapolis six times to be ranked ahead of them. In those situations, it’s not the length of the beatpath that matters – I believe it more has to do with how many routes a team has to another.

    When you look at this week’s graph, you can just tell that some beatpaths are just begging to be busted apart, while others seem extremely stable. The rankings don’t yet take any of that into account.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>