Whew! Check out this graph. I don’t think I’ve seen one this tall in the last two years. The power rankings were 8-8 this week. What’s interesting about this week is that there are almost no beatloops. What I mean by that is that almost all of the beatloops were restored to the graph due to beatflukes. Thanks to alternate beatpaths, there are many, many games that are considered fluke victories. Here they are:
NYJ=>NE
ARI=>SF
CAR=>NO
CAR=>BAL
PIT=>KC
PIT=>NO
TEN=>WAS
TEN=>HOU
MIA=>CHI
BUF=>MIN
STL=>DEN
JAC=>PHI
TB=>PHI
MIN=>SEA
MIA=>KC
BUF=>GB
After the beatpath segments that contradict those games are restored to the graph, the only beatloops that are left over are:
CIN=>KC=>SD=>CIN
CAR=>TB=>CIN=>CAR
ATL=>CAR=>CLE=>ATL
ATL=>CIN=>CLE=>ATL
DAL=>HOU=>JAC=>DAL
There will be some surprises in the power rankings this week. The AFC West is looking quite powerful. San Francisco and Green Bay jump in the rankings. Atlanta, Cincinnati, and Tampa Bay all fall.
What does the graph look like without considering beatflukes?
Is there any sort of penalty in the rankings for a team that looses a beatfluke game? By removing the results, you’re essentially penalizing the team that won the game but shouldn’t have, but there doesn’t seem to be any consequence for a team to have lost a game the graph says they should’ve won. Three teams have lost two “fluke” games (NO, PHI, KC), and it seems like that should have some sort of effect.
I do think that having beatflukes make for a more interesting system than without them though, so don’t think I’m for removing them.
I was thinking when I looked at the results this week that there was absolutely no way for me to predict what the graph was going to look like. In that, at least, I was right. Wow.
My intuition about the teams actually agrees with most of the beatflukes. The only “fluke” games I’d dispute are MIN=>SEA (and with that long alternate SEA=>MIN beatpath, it’s likely to show up again), ARI=>SF (both teams are bad, should be a beatloop with OAK), and both games involving PIT (a team that inconsistent needs all its games considered).
The advantage of the beatfluke system is that it allows more information to be considered, giving more vertical graphs and less ambiguous rankings. The disadvantage, I suppose, is that it makes the overall graph less stable. For example, if Pittsburgh manages one more highly rated win, then they may get both beatflukes restored, beatfluke away their loss ot OAK, and will suddenly vault from the bottom of the graph to very near the top.
Also, totally unrelated question: how do you/does the program decide where to slot ambiguous teams on the graph? For instance, Cincinatti could be where it is, or all the waaaaay down next to Buffalo. I’ve already figured out that the program tries to keep the amount of long arrows to a minimum. So in the case of CIN, it is up where it is because it has two arrows going into it, and only one coming out, and it’s better to have one long arrow than two. This also explains Dallas’s position.
But this doesn’t explain every case. A good example this week is Cleveland and Oakland. Both have two arrows coming in and two coming out, so there’s a significant amount of “slack” in where they can be put. Both are currently as high as possible. You could put CLE all the way down next to JAC, and OAK down next to BUF/ATL/CAR, and the total arrow length would be the same. This would jive more with my perceptions of the teams, which is why I bring it up.
I notice that you color the backgrounds of all teams in the same division the same. How do you pick what color to use?
Adding to some of the other responses this week, is a 45-0 win (PIT>KC) really a fluke? Also, JT mentioned 3 teams with two “fluke” losses. Likewise, there have been 3 teams with 2 “fluke” wins, CAR, PIT and TEN. The results this year may require some additional thinking on what classifies a fluke. The idea with this many “flukes” is that perhaps the additional beatpaths are the flukes. For example, if OAK > PIT was the fluke, then perhaps the result would be a net total of one fewer fluke.
There could be some graph theory involved in minimizing some measure that may be more ‘accurate’.
Whew, popular week.
First, the non-beatfluke graph doesn’t actually look a whole lot different. It’s still pretty vertical. If a team is in a beatloop with another team, it doesn’t mean they don’t have an alternate beatpath to that team. So when a beatfluke restores a beatpath segment to a graph, in many cases, that beatpath segment is partially redundant, already in the graph. Looking at the power rankings for both variants, it looks all of the teams are within 3-4 slots of where they are in the other ranking. The exceptions are:
1) CIN: #15 vanilla, #24 beatfluke
2) GB: #22 vanilla, #14 beatfluke
Now, I’ve done a significant amount of backtesting and the beatflukes variant is generally more accurate at picking games. However, the vanilla variant was slightly more accurate last week at 9-7 instead of 8-8. (It successfully picked SF over DET.)
More answers in a sec.
As for penalizing teams who have lost a beatfluke game. Well, all the games are always in the system. So, for instance, if Denver starts playing badly and St. Louis starts improving, then the STL=>DEN game will eventually have an impact again – first, through a beatloop (making Denver lose credit for a different win, and St. Louis shedding a beatloss), and perhaps eventually with St. Louis having a beatpath to Denver. So it’s always still in the graph – it’s just that in each case, there was enough other overwhelming evidence to declare it irrelevant for that point of time only.
doktarr – regarding the length of beatpath segments – remember that I have redundant beatpath arrows removed. For instance, Seattle has a direct beatpath to St. Louis, as well as the one through NYG. So that makes it harder for MIN to cancel out its beatfluke over SEA. Also, I’m going to do a quick exercise here – how many times does PIT need to beat IND to have a beatpath to them?
Once: no difference – loops with JAC and DEN have alternate beatpaths, are declared fluke.
Twice: no difference – although it takes the algorithm longer to find, the alternate beatpaths are still there, and both victories are declared fluke.
Three times: finally, Pittsburgh has a beatpath to Indianapolis. Using a phantom season where all the games have turned out the way they have, except with PIT beating IND an additional three times, the top six teams in the power rankings are: DEN, BAL, PIT, CHI, KC, IND.
In the beatfluke variant, what happens is that IND and PIT stay far apart until everything changes all of a sudden. In the vanilla variant, you see PIT quickly climbing and IND slowly falling. I actually like the vanilla variant better that way, but the beatfluke variant is more interesting and slightly more accurate over time.
#4 – the basic tiebreaker is the strength of immediate beatwins.
For instance, the only two teams in the graph right now with no beatlosses are IND and CHI. One of those two has to be #1. This week, I compare their top eight direct beatwins (many of these arrows aren’t visible). This is kind of a strength-of-schedule adjustment. Using my own number system, the average strength (placement in the graph) of IND’s eight direct beatwins is 1.0. The average of Chicago’s is -2.0. So IND gets #1.
So, once IND is given the #1 ranking, then that means DEN has no other beatlosses. So I compare DEN and CHI. DEN has seven direct beatwins. I take that average, and compare it to the average of Chicago’s seven strongest direct beatwins. DEN: 3.143 . CHI: 1.143 DEN gets #2.
Once DEN is removed, that opens up KC, BAL, and NE – all to be compared against CHI. They all have at least four direct beatwins, so I compare their avarages for each of their best four direct beatwins. CHI wins, so they’re #3. It’s close with Baltimore, but KC is way behind, and NE even further behind.
So, that’s how the rankings are figured. As far as the vertical placement in the graph, I wouldn’t read too much into it – that’s controlled by all the computer scientists at AT&T that wrote the graphics package I’m using.
For all I know it might be trying to minimize horizontal space or something.
BGcolor of teams was an arbitrary choice, based off of the team colors of who looked strongest in each division around a year ago. I’ve gotten attached to the colors and haven’t changed them since.
OAK=>PIT is not a fluke because PIT doesn’t have an alternate beatpath to OAK. Ah, but you might say, PIT beat KC, which has a beatpath to OAK. But remember the system removes smallest beatloops first, and then refigures.
1) remove splits, find beatloops
2) remove 3-team beatloops, find longer beatloops
3) keep going until no more beatloops are found (I think the record is a seven-team beatloop last year).
This keeps 19-team beatloops from showing up, since they are almost always straighted out by the removal of a far more relevant three-team beatloop.
I do think that PIT is fairly close to regaining credit for some of their wins – but even in the vanilla variant, while OAK doesn’t have a beatpath to PIT anymore, OAK is still ranked ahead of them (it’s the HOU=>JAC=>PIT beatpath that is keeping them down).
I have a question about beatflukes. These are the games that the system decides that the result is not accurate and removes it from the graph. In the “How are beatloops resolved” section, there is a hypothetical graph showing Min=>NO, NO=>TB, NO=>Det, TB=>Min, Det=Min. Since Min has 1 beatpath to NO, and NO has 2 completely independent beatpaths to Min, wouldn’t the beatfluke system discard Min=>NO, and leave a graph showing NO=>TB=>Min and NO=>TB=>Min. Doesn’t this contradict the resolution in that section?
And as I side comment, I think the fact that the word independent above is important because let’s say instead of NO=>TB and NO=>Det we had NO=>Wash, Wash=>TB, and Wash=>Det,we wouldn’t know if the flukey part of the graoh were Min+NO or NO=>Wash (since both loops back to Min from NO go through Wash, i.e. NOT independent.)
re #12: I think that in the MIN example you mention, both those beatloops are removed, so there is no alternate beatpath to lead to the finding of a fluke.
In fact, there are no fluke scenarios in the graphs on that page. Your second example would just create a MIN=>NO=>WAS=>TB=>DET=>MIN beatloop, and a MIN=>NO=>WAS=>TB=>MIN beatloop. The smaller second one would be removed, leaving a TB=>DET=>MIN beatpath graph. (Hope I don’t have that wrong, I am doing that in my head.)
I like the colors too and think I saw someone else use them (or a close variant) for a GREAT chart based on the DVOA rankings, so I was wondering if there was a “standard”.
Per Paul’s comment #4, there are actually 5 teams currently with 2 “fluke” wins, CAR, PIT, TEN, MIA, and BUF. Carolina got covered heavily last week, and their situation has improved since they won, which is an example of how the system can work.
The other 4 teams with 2 fluke wins have only 2 or 3 wins overall. TEN only has 2 wins, which are both considered flukes, the rest have 3 wins, and the one non-fluke win was vs either TEN (in Miami’s case), or MIA (for BUF and PIT). These all seem to make sense to me.
So I do like how the system deals with flukes. Lose to poor teams, but beat an occasional “good” team, and the wins are considered flukes. Win more games, and the occasional loss to a poor team can get wiped off the record (either as a loop or a fluke the other way), and regain credit for wins vs good teams.
My point in comment #2 kinda ties into the explanation in comment #9 above. When comparing teams, you’re currently looking at strength of beatwins. But take, for example, the case between IND and CHI. Each have no beatlosses at the moment, but CHI has a fluke loss, a game that they should’ve won but didn’t. As that game was wiped out, there isn’t any penalty in the rankings for a loss that should’ve been a win. In the IND/CHI comparison, it doesn’t matter much, nor down through the rest of the comparison in the post, since all the teams mentioned except IND have a fluke loss. But should there be a penalty in the rankings for fluke losses, perhaps an intermediate tiebreaker. Look at beatlosses, flukelosses, and then strength of beatwins. Just a suggestion.
Those who think it’s “unfair” that some teams lose wins due to beatflukes should check out the “How are beatloops resolved?” link on the main page.
You checked it? Good. The bottom line is that the algorithm is interested in being consistent, not being arbitrary, and producing a directed graph. There’s no effort to treat every arrow equally. For instance, you could have a situation where 2 teams have 5 common opponents, team A is 0-5 against the common opponents, and team B is 5-0 against the common opponents. If team A beat team B, and there are no other common games, then all 11 games are cancelled out, and team A and team B are considered equal despite 1-5 and 5-1 records, respectively.
This is an extreme example, of course (and it’s not all that realistic given the NFL schedule, since there would always be other common games). But the point is that even without beatflukes, the system already discards wins and losses in a way that can impact some teams more than others. With that in mind, we shouldn’t be focussing on whether beatflukes are “fair” – they’re just a part of an algorithm. We should focus on whether they make the system more accurate, and according to our host, they do.
As far as whether a “fair” system is possible: the only way I can think of to treat all wins and losses equally would be to make the graph a weighted graph, and decrease weights of arrows in a loop by the minimum amount necessary to remove the loop. So, in the extreme example I gave, all five 3-team beatloops are reduced by 1/5 of their previous strength, which leaves team B with 4 wins of strength .8, team A with four losses of strength .8, and the other 5 teams with one win and one loss of that strength. The only beatpath removed completely would be team A’s win over team B, which was reduced by 1/5 five times.
TT, if you’re interested in implementing this sort of approach, shoot me an e-mail and I can explain more about how the algorithm would work.
Just so you know, I’m not saying that removing beatwins that are determined (in a consistent, repeatable manner) to be flukes is anyway unfair. I finally was able to get my mind around what has been bothering me though, which I tried to say in the previous post, but I’m not sure I was able to explain it.
The rankings (not the graph) first look at the teams with the fewest beatlosses. If there is a tie, then there is an algorithm to see which of those teams has the strongest set of beatwins.
When a game is determined to be a beatfluke, the results are removed from the graph. This removes a beatwin from the team that won, but more importantly to my point, it removes a beatloss from the team that, according to the graph at the time, should’ve won the game.
The rankings then look at beatlosses. A team that lost one game that was determined to be a fluke would have no beatlosses, the same as a team that had not lost a game. Then their beatwins are compared. It seems in there somewhere should be something that takes those losses into account.
Here is my hypothetical example, which is how I understand it all works. Correct me if I’m wrong (which is entirely possible). Start with the following beatpaths I just made up:
CHI->NYG->MIN->GB
CHI->DET->GB
Then GB beats CHI. Without beatflukes, this would result in a CHI->DET->GB->CHI, leaving the CHI->NYG->MIN->GB beatpath. With beatflukes, the alternative path allows the GB->CHI outcome to be considered a fluke. This restores the CHI->DET->GB, leaving the situation unchanged (ignoring any other game outcomes). When the ranking program comes along it looks at the CHI beatlosses (no new ones added), and then the beatwins (none lost) so the situation has not changed, even though CHI lost a game they “should’ve won”.
Keep in mind this was a hypothetical and probably oversimplified example.
One thing is that if a good team loses a couple of beatflukes (that are then not counted), it means that they’re less likely to have as strong a score when you look at the strength-of-beatwins. So in that sense, the beatfluke loss still does have a minor and indirect impact.
But by the time you get to that point, so much is dependent on strength of schedule, which is honestly an inherited flaw from the NFL itself. It’s impossible for the NFL to predict how good each team is going to be, so it can turn out that a team will have a much more difficult schedule than was initially intended. I am pretty sure that is what happened with Dallas in 2005 – according to the beatpath graph, they should have been in the playoffs, but they just had too many losses for the NFL to put them in.