2009 NFL Week 14 Beatpath Graph

A couple of crazy upsets this week, and we’re going vertical again. At 12-4, Isaacson-Tarbell led all variants – this seems to be a pattern late in the season, even though it still has a ways to go to make up for its early-season stumbles. Beatpaths was 10-6, and ITB was 11-5. I was 1-3 for my personal picks, so I am now one ahead of ITB for the season. The BeatPicks were 5-1, or 50-19 for the season.

So, here’s the NFL Week 14 Beatpath Graph. Loops after the jump.

2009-14-nfl-clean.png

Removing OAK split from KC

Removing ATL split from CAR

Removing NYJ split from NE

Removing MIA split from NE

Removing SF split from SEA

Removing DEN split from SD

Removing TEN split from JAC

Removing NYJ split from BUF

Removing MIA split from BUF

Removing CLE split from PIT

Removing HOU split from TEN

Loop: WAS=>STL=>DET=>WAS

Loop: JAC=>HOU=>SEA=>JAC

Loop: ARI=>HOU=>SF=>ARI

Loop: HOU=>SF=>JAC=>HOU

Loop: NYG=>WAS=>DEN=>NYG

Loop: NYG=>DAL=>PHI=>NYG

Loop: NYG=>OAK=>PHI=>NYG

Loop: CHI=>PIT=>MIN=>CHI

Loop: ARI=>MIN=>SF=>ARI

Loop: MIN=>BAL=>PIT=>MIN

Loop: MIN=>CIN=>PIT=>MIN

Loop: WAS=>DEN=>KC=>WAS

Loop: WAS=>OAK=>PHI=>WAS

Loop: TB=>GB=>DAL=>TB

Loop: KC=>PIT=>SD=>KC

Loop: KC=>PIT=>DEN=>KC

Loop: SD=>OAK=>PIT=>SD

Loop: DEN=>CIN=>PIT=>DEN

Loop: BAL=>DEN=>CIN=>BAL

Loop: DEN=>OAK=>PIT=>DEN

Loop: DEN=>DAL=>WAS=>DEN

Loop: BAL=>DEN=>NE=>BAL

Loop: ARI=>JAC=>NYJ=>TEN=>ARI

Loop: ARI=>JAC=>NYJ=>CAR=>ARI

Loop: ARI=>JAC=>BUF=>CAR=>ARI

Loop: ARI=>NYG=>ATL=>SF=>ARI

Loop: ARI=>NYG=>DAL=>CAR=>ARI

Loop: OAK=>CIN=>BAL=>SD=>OAK

21 Responses to 2009 NFL Week 14 Beatpath Graph

  1. doktarr says:

    HOU→CIN is back in iterative as well, although Jacksonville getting out from under Seattle has kept this from adding any to the height of the graph (20 last week, 19 this week). Minnesota keeps its win over Cincy, but still only has a path to the bottom 16 teams. Minnesota has played an exceptionally weak schedule, mostly just racking up wins against bad teams. They won’t really be able to get slotted against the other top teams until after week 17 and the playoffs.

    I think it’s a bit absurd to see teams like Denver, Arizona, and the Giants having no relationship to the majority of the graph at this point in the season. We have a lot of data on these teams, and in my opinion the graph itself should be able to reflect our opinion on whether the Giants should be favored over Detroit or Cleveland. Sometimes it feels to me like we shouldn’t call them the “Beatpath picks”, but the “picks based on an algorithm that looks at what’s left over after we throw out a ton of data” picks.

    I also feel that the way sweeps are handled in standard makes them too significant. A sweep can be erased if it is in one 3-team loop and then one 4-team loop, but a sweep that’s in 10 different 3-team loops doesn’t get erased. I can’t think of any logical reason why that should be the case.

    I’m sorry I keep harping on this stuff. I don’t like being so negative and being the curmudgeon of this particular forum. I keep coming around here because I feel there’s an enormous amount of potential in this approach. I just feel this one small flaw is causing a lot of extra data to be discarded, and really hurts both the stability of the method and the ability of the graph to display what we really know at this stage of the season. Even if the iterative picks aren’t very much more accurate than the standard picks (I think it’s a pretty small edge), the fact that so many more of those picks are beatpicks, and the fact that the graph is so much more stable, are very positive side effects.

    I think this would be even more clear if we looked at a sport with a larger data set, like Basketball or Hockey. Iterative would reliably give us a very vertical graph for those sports, so that the ranking algorithm becomes not much more than a fallback method that’s only used when closely-ranked teams face off.

  2. Tom says:

    One wonders how long Atlanta can hold up the firmament without Ryan.

    At least the reemergence of HOU=>CIN isn’t as bizarre this week.

  3. Tom says:

    @doktarr: teams like Denver, the NY Giants, and Pittsburgh have few connections because of their streaky inconsistency. How do you judge, from W-L alone, a team that starts 5-0 or 6-0 and then loses 4 or 5 straight, including to bad teams? You’re right that we have a lot of data on them, but it’s contradictory and inconsistent data. I am sympathetic to your argument that throwing out data isn’t intuitively the best option, but it’s not clear that the converse is automatically true–keeping ambiguous data may be *more* misleading.

    Also, I’d be interested to see the numbers on iterative’s pick rate versus standard. As you imply in most of your posts, you argue that iterative has better picks and more stability. I’d like to see the numbers. You have side-by-side data over at MOOSE’s site for several seasons–you ought to be able to demonstrate quite clearly what you keep implying. Wouldn’t that be satisfying and conclusive?

  4. ThunderThumbs says:

    Doktarr, no apology necessary for the harping, I think it’s part of what makes the site interesting.

    Regarding football, there have been a few horrendously complicated algorithms algorithms that try to pay attention only to wins and losses and who beat who, just like beatpaths. Unlike beatpaths, they involve strange mathematical symbols, are hard to understand, and don’t have pretty graphs. And so far, I haven’t found one that significantly outperforms beatpaths. Maybe by a few tenths of a percentage point, but I don’t think you can really get much better than that.

    So when we’re comparing things like iterative, beatflukes, or beatpaths, it really depends on what our objective is. Is it accuracy? Because I’m not sure there’s a lot of improvement to be found there. Or is it instead to simply try and display more data, to give people more to go on, to reflect conventional wisdom a bit more, even if it sacrifices accuracy a bit? I think this isn’t necessarily an approach that should be ruled out. There are a lot of fun ways to explore the nfl using just wins, losses, and graphs, and all the data that can be derived from them.

    As for the graph, I agree entirely that a larger sample size could make for more interesting data, which is why I’m interested in continuing with basketball.

    Also agree with Tom about streaky teams. That doesn’t so much have to do with the sequence of the games, but every season there are teams that are just kinda bipolar. Whether it’s a young team like Denver that may or may not be outplaying their potential, or a team like Pittsburgh that is completely different depending on if one player is in the line-up… (the idea of splitting a team into two identities has occurred to me more than a few times)… sometimes looking at a team’s entire-season performance is going to be problematic. However, that’s a problem that the league has too, so I’m okay with that.

  5. ThunderThumbs says:

    Here’s kind of a backwards way of playing with the “ambiguous graph” problem.

    1) Generate the graph and the power rankings as they are generated now.
    2) Using the graph, restore all games to the graph that don’t conflict with the rankings.
    3) Refigure graph and rankings. And if necessary, redo steps #2 and #3 until there are no changes.

    So in Denver’s case, it would result in SD->DEN->NYG and DEN->OAK being restored to the graph. The following games would still be ignored: DEN->CIN, DEN->NE, PIT->DEN, BAL->DEN, and WAS->DEN. (And the other SD->DEN).

    Then you’d refigure the graph and who knows what that would do – it’s possible it might knock Denver above Cincinnati and then you’d have to refigure again until there were no more changes.

  6. ThunderThumbs says:

    Ok, I just tried deleting all “fluke” games and re-running. It’s somewhat interesting. The graph looks a little cleaner, and the rankings are more similar than I thought. The top ten are the same in order, although IND, NO, and SD are exactly tied for first place. A few other teams next to each other swap places. And then further on down the graph, Seattle climbs and Tennessee falls. And actually, Denver does jump ahead of Cincinnati, so if I then add that victory back in, Denver climbs more, and NO falls to third (one point behind SD and IND).

    I’m torn on whether to actually try and implement this up as a variant, it would involve taking apart a fair amount of the code. Interesting exercise, though.

  7. doktarr says:

    Tom, the lack of rankings of NYG, PIT, or DEN has literally nothing to do with their streakiness, as we have no time decay in the formula. It does have something to do with the inconsistency of their results, of course.

    Of course the data is inconsistent and somewhat ambiguous – otherwise we wouldn’t have any beatloops. My point is that the current approach decides which data to throw out and which data to keep in a very arbitrary way. There’s no real logic in a system that could allow one loss to loop away five or six wins, but allows the team with that loss to keep it’s “strength of schedule” wins because those can’t be in three-team beatloops. There’s no real logic in a system that allows a season sweep to hold up even if it conflicts with four or five different three-team beatloops, but gets rid of that sweep if it is in exactly one three-team beatloop and one four-team beatloop.

    I haven’t ever really read any argument from anyone why these things are good or make sense. TT has implied that he doesn’t want to only get rid of one game in a loop because this treats that game differently than the other. But in my mind, the exact opposite is true. If we allow the result of one game to cause several other games to be thrown out, then it is that one game that is being over-weighted.

    TT, I don’t really understand why you would want to throw out a bunch of data, and then attempt to restore some of it using a seperate approach. Why not just throw out the minimum amount of data required to produce a directed graph in the first place?

  8. doktarr says:

    As far as the goal – yes, it is to increase accuracy (at least a posteriori accuracy), and also to display more data.

    I don’t have the code, so it would be a pretty horrendous job for me to go in and compile records. Perhaps MOOSE could go through and compile how many picks for the year each approach gets wrong based on that year’s end-of-season graph. Personally, I’d be interested in the following “bins”:

    Was the pick a beatpick or not?
    Was it a beatpick or not for the other approach?
    Was there a disagreement between the approaches?

    That gives eight bins for each approach, iterative and standard/classic/parallel.

    Since I don’t have the database/code, though, let me provide a very simple example that shows why I like iterative.

    A→B
    A→C
    B→D
    C→D
    D→A

    Run the current standard algorithm on this data, and everything gets wiped out. If asked to predict the results of these games after the fact, the standard algorithm would produce a 0-0-5 picks record.

    Iterative would reduce the weight of the first four games by 50% each, and wipe out D→A. If asked to predict the results of these games after the fact, the iterative algorithm would produce a 4-1 picks record.

  9. doktarr says:

    Correction – it’s not impossible for the two SoS games to be in a 3-team beatloop. It’s just a lot less likely because there’s only 3 teams that have an opportunity to create that loop.

  10. Jonathan says:

    Or maybe just make it so a sweep is never removed? I admit that this could mean that some loops couldn’t be resolved. Maybe a sweep should be removed if and only if another sweep is in the same loop (which probably wouldn’t happen).

    That way a sweep would remain even if it is both in a 3-team and a 4-team loop. It’s a sweep… that just isn’t ambiguous unless you have a looping sweep like Team A sweeps Team B which sweeps Team C.

  11. Tom says:

    @doktarr:

    You wrote, “There’s no real logic in a system that could allow one loss to loop away five or six wins, but allows the team with that loss to keep it’s ‘strength of schedule’ wins because those can’t be in three-team beatloops.”

    I’m not clear on why this is a problem. Do you feel that it’s unjust, or arbitrary, or what? Why is it being easier to keep those two strength-of-schedule wins problematic?

    Regarding your mini example, I have two questions:

    1) Given that we have no objective way of distinguishing which games are more flukey than any others, why is ignoring a game because it’s present in the most loops justifiable? Why does its presence in the most loops make it unrepresentative of the team’s play?

    2) You conclude saying, “If asked to predict the results of these games after the fact, the standard algorithm would produce a 0-0-5 picks record.” But this is a misrepresentation of the scenario. Those five games in your example could not have all happened in one week; they’re the product of at least three weeks of play. In those three weeks, data has been collected on A, B, C, and D’s performance against other teams. So even if their four team loop gets erased, we can still place them within the larger connected graph, and thereby produce a retroactive pick record. We can’t say that either Standard or Iterative would produce an 0-5 or 4-1 record respectively, given that we don’t have the rest of the information from the three weeks of play.

    In any case, arguing theory without having data to back it up seems to result in all of us spinning our wheels. I’ll be excited to see Iterative’s week-on-week pick record, and its regular-season-end retroactive pick record for comparison purposes.

  12. doktarr says:

    “Do you feel that it’s unjust, or arbitrary, or what?”

    Arbitrary. It’s leading us to throw out some data and keep other data because of the way the algorithm resolves loops, but which data is held onto and which is discarded is a sort of artifact of the process as opposed to the result of some reasoning about why we think this game or that game is inconsistent.

    Nobody can explain why a season sweep can be in 10 three-team beatloops and still appear in the graph, while a season sweep that is in one three-team beatloop and one four-team beatloop doesn’t. That’s because there ISN’T a reason. It’s just the way the algorithm works.

    “Given that we have no objective way of distinguishing which games are more flukey than any others, why is ignoring a game because it’s present in the most loops justifiable? Why does its presence in the most loops make it unrepresentative of the team’s play?”

    But we DO have an objective way of distinguishing which games are more flukey than others. The presence of a game in a loop, especially a small loop, is evidence that it might be flukey. That assumption is the entire basis of what we’re doing here. It’s by throwing out those questionable games that are in loops that we get an acyclic graph.

    The only leap I’m making in the iterative algorithm is saying that being in MORE loops means that it might be MORE flukey. That’s a very simple, obvious, and logical step to make.

    Furthermore, let’s be clear about something: I’ve put out a refinement of the algorithm based on this assumption (along with a desire for the final graph to have some other nice properties). By contrast, the original algorithm has… no real counter-assumption. We throw out some games but keep others for reasons that are, as I said, essentially arbitrary. Nobody has even *attempted* to articulate why the odd examples I’ve pointed to are resolved the way they are, other than by appealing directly to the mechanics of the beatloop resolution process.

    To paraphrase Walter from The Big Lebowski, “Say what you will about the tenets of the iterative algorithm, but at least it’s an ethos.”

  13. doktarr says:

    “But this is a misrepresentation of the scenario. Those five games in your example could not have all happened in one week; they’re the product of at least three weeks of play. In those three weeks, data has been collected on A, B, C, and D’s performance against other teams. So even if their four team loop gets erased, we can still place them within the larger connected graph, and thereby produce a retroactive pick record. We can’t say that either Standard or Iterative would produce an 0-5 or 4-1 record respectively, given that we don’t have the rest of the information from the three weeks of play.”

    You’re essentially saying that we shouldn’t worry about throwing out all that data that would have led to a 4-1 pick record, because some other data might rescue us and we’ll get them right anyway. But we could just as easily end up in a situation where the supporting data would not slot these teams at all. This counter-argument doesn’t really address the point. We have meaningful data, and we’re casting it aside for an arbitrary reason.

    In fact, you could end up with data that indirectly supports the opposite conclusions. For instance, say no beatpath relationship emerges between the teams, but B ends up with more support under it. The ranking algorithm will then pick B over A.

    This sort of thing happens all the time. It’s why the iterative algorithm correctly predicts all 13 games the Giants have played this year, will all of them being beatpicks, while the current algorithm is 6-0 in beatpicks, but relies on the ranking algorithm for the other 7 picks, going just 4-3. It’s throwing out useful data that suggests that the Giants actually haven’t been that inconsistent after all – they just played an easy schedule that got a lot harder.

  14. ThunderThumbs says:

    Nah, at this point I’d say your characterizations are unfair.

    First, my viewpoint is definitely NOT that a game being in a beatloop is evidence that it is flukey. This is a subtle point but it’s important. What I’ve said over and over again is that it means the relationship between the teams in the loop is ambiguous. That is all. So we remove the ambiguity, and we use the rest of the graph to determine the relationship between those teams.

    At THAT point, we can try and identify which of the games in the previously removed beatloop may be “flukey”. And that also answers your earlier question about my experiment the other night. But it would be the rankings that tell us that, not the number of loops it is in.

    The presence of a game in a loop is not evidence that a game might be flukey, though. And so in turn, if a game is in more loops, it does not mean that the game might be more flukey. A game that is in more loops could just as easily (or more easily) be because of scheduling quirks.

    My other main problem with iterative is it puts a numerical value on the wins, like “half a win” or “1/3 a win”. When my own desire is to completely avoid complicating factors like that. I like sticking to principles like a win is a win.

    As for the handling of season splits, I’ve tried my best to explain it in the past. I don’t think it’s ideal either, but I won’t change it unless or until we come up with something that is more elegant, and not just awkward in a different way. I agree with Moose’s point that some of this is really just because of the NFL scheduling. The season split stuff doesn’t appear to be a problem in NBA or MLB graphs.

  15. doktarr says:

    Continuing from the Giants example, let’s look at the other three teams that are largely detached from the graph this week.

    Denver:

    Let’s set aside the season split with SD, since obviously any approach will go 1-1 on those, and there’s no beatpath relationship in either approach.

    In iterative the beatpicks are 8-3, missing all the games of the November swoon aside from SD (losses to PIT, BAL, and WAS).

    In the current algorithm, the beatpicks are 3-0, and the non-path picks are 2-6, missing the same three losses, plus the wins over CIN, NE, and DAL.

    Arizona:

    In iterative, the beatpicks are 8-4, missing both losses to SF, as well as the losses to CAR and TEN. Iterative is 1-0 in non-path picks, as ARI has much more support under it than MIN despite no beatpath relationship.

    In the current algorithm, the beatpicks are 5-0. The non-path picks are 3-5, missing on Minnesota but otherwise making all the same picks.

    Carolina:

    As with Denver, let’s set aside the split (with Atlanta in this case) and look at the other 11 games.

    Iterative is 10-1 in beatpicks, getting the win over Arizona wrong, but everything else right.

    The current algorithm is 6-0 in beatpicks, and 4-2 in non-path picks, getting the win over Arizona and the loss to Buffalo wrong.

  16. doktarr says:

    “So we remove the ambiguity, and we use the rest of the graph to determine the relationship between those teams.”

    But you don’t remove the ambiguity if it’s part of a season sweep… but you do if it’s in a four team loop.

    “At THAT point, we can try and identify which of the games in the previously removed beatloop may be “flukey”. And that also answers your earlier question about my experiment the other night. But it would be the rankings that tell us that, not the number of loops it is in.”

    Why would we throw out a bunch of data _before_ we start to look at what is or isn’t “flukey”? The idea of putting games back in really ruins your earlier argument, that you want to remove the ambiguity. You’re putting this ambiguity back in, but only part of it.

    The iterative algorithm basically lets the games vote on which games should get off the island, and keeps going until there aren’t any that don’t fit. It’s beatpath survivor!

    “My other main problem with iterative is it puts a numerical value on the wins, like “half a win” or “1/3 a win”. When my own desire is to completely avoid complicating factors like that. I like sticking to principles like a win is a win.”

    If it makes you like it better, you can completely ignore those factors after the loops are resolved. The point is to keep as much of the data as possible, and do it in a way that allows the graph to actually reflect more results more accurately.

    “I don’t think it’s ideal either, but I won’t change it unless or until we come up with something that is more elegant, and not just awkward in a different way.”

    Are you saying that even if the iterative graph is:
    1) More stable over the course of the season
    2) Able to display more of the picks in a visual fashion
    3) Slightly more accurate at predicting the games a priori, and significantly more accurate at “predicting” the games a posteriori,

    That you wouldn’t want to switch because it’s “awkward in a different way”?

    “I agree with Moose’s point that some of this is really just because of the NFL scheduling. The season split stuff doesn’t appear to be a problem in NBA or MLB graphs.”

    Working with a sparse data set will always be harder. But I’d like to see what iterative produces for those graphs before we say it’s all the fault of the NFL schedule. I suspect they would me much more vertical.

  17. ThunderThumbs says:

    Doktarr, the season split thing really isn’t as big a problem as you’re emphasizing.

    If there’s a season split, and there’s a beatloop for one of the games, the beatloop removes the one game, but the second game remains, so there’s still a beatpath. This is a good thing.

    Every single other three-team beatloop is removed at the same time (except for fully and entirely reinforced beatloops, as we’ll see in the nba and mlb), which drastically reduces the other longer possible beatloops.

    It really is a relative rarity. In mlb and nba there are almost zero beatloops longer than three teams at the end of the season. You’re complaining about a case that involves one game of a sweep being in multiple shared beatloops, and a second game in a sweep being in a relatively rare longer beatloop. This just doesn’t happen very often, and when it does it would make sense that it would wreak a little bit of havoc on the graph:

    If a team sweeps another team that otherwise seems clearly better than it, there is pretty clearly some bipolar identity conflict going on. It’s pretty impossible to reconcile a team’s full-season performance into one picture. You’re either going to have to ignore one set of games, or another set of games. And depending on their week-to-week performance, we’ll have evidence to look at it one way, or another. It’s like balancing on the head of a pin.

    We saw this a couple of seasons ago with Houston (much worse than they are now) sweeping Jacksonville (much better than they are now). It caused a lot of tension in the graph, and sometimes Houston would be down near the bottom, and other times it was like the system was saying, “man, twice – maybe they really are better than jacksonville.”

    I don’t see why it doesn’t make sense to you that it should be difficult to throw out a season sweep. This is a team that beat the other team head to head twice in one season, both home and away. That should be held to a really high bar.

    As for iterative, yes, I am saying (and have said before) I have a basic bias against the idea, because I don’t see a human-english rationale for believing that a game is more likely to be flukey just because it is in more shared beatloops. It feels a bit like making the data fit a conclusion. Conceptually I can make just as much as a defense that the other teams in the loop “really should have” beaten the team on the shared beatloop, and should be penalized for not doing so.

    As for retrodictive accuracy, a loop can have two flukey games and one non-flukey game. Coming up with rankings that contradict the fewest *number* of games as possible each season is an interesting problem, but it isn’t how we’ve stated the objective. It would trivial to come up with a ranking that would have greater retrodictive accuracy than the current method yields. Heck, this week I think CIN is one slot ahead of DEN – I could just flip those and it’d be more accurate in terms of retrodictive accuracy. But the rankings are saying that given the season so far and the relative performance of the two teams, Cincinnati would have the edge. And so far I haven’t found a beatpath variant that beats edgepower’s full-season predictive performance.

    Anyway, I really think it’s silly for us to keep on arguing about this – we both know where each other stand! You have a hypothesis, and it’s largely untested. I really do intend to test it, it’s just that it’s a hobby site that makes me no money and I need to find time to implement it when I’m not working for clients, songwriting, or keeping this site going. I will, I just don’t know when. My bias against it – I’m just remarking on it, it doesn’t mean that I’m not going to code in support for it. This isn’t a religious difference. In the meantime I hope you keep sticking around, drawing contrasts, and helping to suggest other cool variants or ways of looking at it along the way.

  18. doktarr says:

    I don’t have any problem with it being difficult to throw out a season sweep. I have a problem with the way it is handled being inconsistent with the way every other win is handled. As I said, if a sweep is in 10 different 3-team beatloops, we don’t throw it out. If it’s in one 3-team loop and one 4-team loop, we do.

    “We saw this a couple of seasons ago with Houston (much worse than they are now) sweeping Jacksonville (much better than they are now). It caused a lot of tension in the graph, and sometimes Houston would be down near the bottom, and other times it was like the system was saying, “man, twice – maybe they really are better than jacksonville.””

    Right, and that’s sort of my point. The wild shifts this caused in the assessment of Jacksonville really didn’t make sense, given all the data we had, and it was only due to some oddities in the system that JAC kept jumping around.

    Contrast that to iterative, which kept both teams in a pretty stable spot from there forward:

    http://beatgraphs.com/archive/iterative/2006/I_2006_10_NO.php

    “I don’t see a human-english rationale for believing that a game is more likely to be flukey just because it is in more shared beatloops.”

    Hm. Maybe this is a case of missing the forest for the trees. Take a step back. What is a flukey game usually called? An upset. Upsets are called upsets because they conflict with expectations. What forms our expectations? Previous results.

    When a team that’s way over .500 loses to a team that’s way under .500, we call that game a fluke, or an upset. And it’s much more likely that the winner of the game has lost to teams that the loser has beaten when the winner is way below .500 and the winner is way above.

    Only one game shows up in three three-team beatloops this week: Washington over Denver. Honestly, wouldn’t that have been one of your first votes for “flukiest wins of the year”?

    The extreme example I like to give is the 15-1 team and the 1-15 team, where all 16 of their games are erased by 16 3-team beatloops, and those two teams are considered equal and complete unknowns. I think it’s nearly impossible to argue that we don’t know which game is the fluke. You can argue that that’s an extreme, unrealistic example (try telling it to the 2004 Pats, though) but the reality is that we should expect the algorithm to be able to handle a situation like that.

    “And so far I haven’t found a beatpath variant that beats edgepower’s full-season predictive performance.”

    Iterative has done better. In terms of prediction it’s been close, but I’ve made plenty of posts where I’ve pointed out the “conflict picks”, and I’m pretty confident that over the years the iterative picks have come out slightly ahead.

    Moreover, there’s the other advantages:

    - A much more consistent ranking. Having a graph that changes radically every week conflicts with our intuitive notion that teams are fairly stable in quality and should be stable in ranking unless something changes. The fact that iterative is much more stable should be seen as a positive.

    - A much more vertical graph. This is really nice in my opinion because it means that much of the ranking can be appreciated visually just by looking at the graph. You can argue that the ultimate goal is the rankings and everything else is irrelevant, but to me the actual graph is where it’s at.

    - Better retrospective prediction. Yes, this is not the be-all of the approach, but isn’t part of the goal of the rankings to be descriptive of the season, not merely predictive? Aren’t the two closely related, after all?

    “You have a hypothesis, and it’s largely untested. I really do intend to test it”

    Well, MOOSE already has iterative graphs going back through 2001. My guess is that he could cook up predictive/descriptive records of each algorithm fairly easily.

    And don’t worry, I do intend to stick around. I’ve been advocating iterative here for, what, three years, now?

    I guess my focus has always been a bit different than yours, since I see this as a fantastic way to do strength of schedule adjustments, as opposed to really caring about the purity of just looking at wins. I like the win metrics more than the point metrics, but only because experience has shown that the point metric is pretty lousy. I’d still LOVE to see this approach applied to something like game-by-game VOA data from footballoutsiders. I keep pestering Aaron Schatz to include game-by-game VOA data in the premium content.

  19. mm says:

    Isn’t this the same argument from previous years?

    Yes, if you want to have the best predictor for previous games, you’d want the system that leaves the most games in. Iterative will leave in more games than the other systems and so should be better than the other systems at predicting games that have already been played.

    That doesn’t mean that iterative is the best at predicting future games, or that it is more correct at showing the ‘true’ relationship between teams. The first would have to be shown, the second really comes down to opinions.

  20. doktarr says:

    My long reply from last night appears to be in moderation purgatory. But mm seems to have summed up the debate in a fairly satisfactory way in five sentences.

    The only thing I would add in iterative’s favor is that whether or not it’s displaying the ‘true’ relationship, the iterative graph does give us a more descriptive visual display of what’s happened during the season.

    MOOSE, is there any chance you would be able to crank out some comparisons of predictive accuracy over the last 8+ seasons? If you give me a peek at the code, I could try to take a crack at it myself (depending on the language).

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>