Pick Confidence – Wildcard Round

Tom once again—a bit late in the day, but I thought I’d get these up here as the games started in earnest.

(out of 100)
BeatPower comparison
(predicted winner – predicted loser)
Indianapolis-San Diego 79.4 98.2-18.8 WRONG
Baltimore-Miami 37.6 82.0-44.4 CORRECT
Atlanta-Arizona 26.8 75.0-48.2 WRONG
Minnesota-Philadelphia 26.8 83.9-57.1 WRONG

Beatpaths seems much more confident about the Indianapolis-San Diego matchup than the conventional wisdom. ESPN’s inLine only gave Indianapolis a 3-point spread before the game started. Likewise, Beatpaths disagrees with the conventional wisdom about the Minnesota-Philadelphia matchup. The system gives Minnesota the edge, whereas the majority of commentators seem to give Philadelphia the advantage.

7 Responses to Pick Confidence – Wildcard Round

  1. Rick says:

    I haven’t posted on here all season because the rankings have, this year, been so abysmal.
    I said last year that the simple concept of wins and losses won’t provide enough information to be a stable and predictable means of determining who the best teams are, or how they will perform week to week.

    The primary reason for this is the small statistical amount of information. The second reason is that wins are a function of several variables, not a variable in and of themselves.

    The information provided is a good base, but has to be improved with something to make it useful and worthwhile.

    One thing I used to tinker with is offensive yards per point, and defensive yards per point. Generally speaking, teams that are good will have a better ratio (reasoning speaks for itself). I haven’t studied this relationship in many years, but I’m pretty sure it would still stand up.

  2. The MOOSE says:

    Weighted method wins week 1. It picked SD over IND and PHI over MIN.

  3. doktarr says:

    Iterative without the MIN->NYG questionable game had PHI/MIN as a pick’em game, too.

    Rick, the whole point of this exercise, as I understand it, is not to make the best system possible. It’s to show how a small data set (just wins and losses) can produce more accurate results than it typically does (i.e. just winning percentage) if you use the data set in a more clever way. And when you look at my favorite version of the rankings (http://www.beatgraphs.com/images/I_2008_17_AL.png) the results seem pretty reasonable to me.

    It’s really tough to fault a system for picking the 12-4 team over the 8-8 team. And honestly, the Colts really should have won that game. Usually when you out-gain your opponent from scrimmage (in regulation) by almost 100 yards, and you have zero turnovers to your opponents’ two, you win. They lost because the SD punter had the best game a punter has ever had, because the Chargers got several marginal calls down the stretch*, and because they lost a coin flip. Despite all that, they still could have won if they convert a 3rd-and-2. Sproles went off, sure, but that was part of a tactical choice the Colts made to shut down the Chargers deep passing game (which worked; they shut down Jackson and held one of the best offenses in the game to 17 points in regulation despite great field position).

    Gah, I’m getting pissed off again. That was the most disappointing Colts game I’ve watched since I started rooting for them circia 2001.

    * None of those calls were clearly wrong, but several of them were less egregious than uncalled things the other way, including a face mask on Freeney in the 4th quarter.

  4. boga says:

    Rick, to chime in with doktarr, this isn’t touted like an end-all be-all way of ranking teams (like football outsiders).

    “The primary reason for this is the small statistical amount of information.”

    Since we are only looking at wins and losses, by the end of the season, we actually do have a lot a information. The top team will have around 50,000 to 100K unique beatpaths.

    “The second reason is that wins are a function of several variables, not a variable in and of themselves.”

    I don’t understand how they are not a variable. Sure, they are functions of really only two variables. In a specific game, Points Scored and Points Allowed. The ONLY stat that matters when going to the playoffs (unless you are way deep in tiebreakers), is wins and losses. And when the day ends, all that matters is if the team won, not how they won.

    “The information provided is a good base, but has to be improved with something to make it useful and worthwhile.”

    I think the system is quite sound and fundamental. Take for instance football outsiders. For most of the season, they ranked the eagles as the best team in the NFL. The final season rankings, DVOA has them number one. Their record? 9-6-1. The iterative and standard have them at 11 and 16. Much more indicative of how they actually won and lost games.

    Why the big difference? Mainly cause Philly lost to washington. Thus, based off of wins and losses, philly can’t be better than washington.

    You can walk into a random bar and find a random person, and say X team is better than Y team because of wins and losses. They might not like that answer, but they WILL be able to understand how it works

    You talk to the same person and talk about DVOA and weighted gains and expected outcomes and replacement level and they have no idea what you are talking about. Plus, you add, well, if they played 1000 time, team Y would win 950, so they are a better team…except when it counted, they lost. Beatpaths goes, well, your rate and counting stats are awesome, but hey, you LOST, down the rankings you go.

    That is why I like beatpaths. Simple, elegant, can be explained to a 6 year old. And if team x beat team y, well, that means they are a better team.


  5. Rick says:

    The system cannot be sound. The record of the system in choosing winners is barely better than I do on my own, which is to say about 55-60%.
    And while it isn’t designed as a be all to end all system, it should perform better if it is to be useful. Your use of the Washington/Philly games as being indicative of Philly being worse than Washington defies logic. I remember 1986, when the Mets won the World Series, the ONLY team they had a losing record to was Philadelphia. Thus, by your logic, Philadelphia was better than the Mets that year. Yet their records were not even comparable. The same can be said for the Eagles and Washington. While Washington did beat Philly twice, Washington had more letdowns over the season, thus they cannot be as good as Philly.

    As far as the amount of data goes, you can say there are X thousands of “beatpaths”. But these are fundamentally unsound. Why? Because alot of these “paths” are games that were never played – and as your own system is having problems dealing with, beatloops occur far more frequently than is comforting. A “Beatpath” has to be confirmed with something beyond a simple win.

    DVOA WORKS because it accounts for ALOT more of the variables. While it’s unlikely the Eagles will win the Super Bowl, they have been one of the most consistent teams this year on a game to game PERFORMANCE basis (not outcome). Thus, their record could be much, much better than it is except for a few bounces that didn’t go their way. They came very close to losing a record number of games by less than 7 points – meaning they were competitive in all but one of their games this year. Few other teams can say this. The Eagles’ are where they are because THEY ARE THAT GOOD. Pure and simple.

    I’m a person who loves simplicity (there are only 3 rules to designing a program that allows a flock of birds to fly…there is something poetic in that). But too much simplicity creates as much disharmony as too much chaos. In other words, too much choice is as bad as having no choice.

    Many people cannot understand DVOA, but it does a better job of explaining things than this system does. I’ll continue to visit this site, because there’s a value to the information here, but it’s very limited value. Just look at the top teams! The Jets didn’t even make the playoffs! So how is there value in saying they are a top team? The Colts lost in the Wild Card…so where does that leave you? Minnesota, at #6, has lost to the Eagles, who are 14!

    DVOA, at least, has their top 8 in the playoffs, and 7 of their top 8 are in the final 8. That’s damn good. Only 5 of Beatpath top 8 are in the final 8. Only 6 made the playoffs!

    My suggestion wasn’t to make this a be all to end all system. It was simply to overcome some of the obvious faults in designing a system that is reliant on functions as opposed to variables. The function of a win is something like yards gained + points – yards lost – points + intangibles. You can put parentheses where you’d like or alter the function, but that’s it at a base level.

    Wins themselves tell you little about the team. If they did, then the 19-0 New England Patriots would’ve been something to behold as opposed to the 14-6 New York Giants. While DVOA has a hard time dealing with intangibles, too, at least it closes alot of the gaps that simple wins/losses imply.

    I agree that walking into a bar, you can find a random person who will say X team is better, yada, yada. That’s barstool logic most of the time. It’s meaningless.
    But if people can realistically discuss statistics a clearer picture is available. The reason the Giants won had alot to do with DVOA…just not season long DVOA. Their DVOA at the end of the season, and into the playoffs, was excellent. So statistics tell part of the story, while recent trends tell the other part.

    Finding a way to mesh that creates a more full picture. That’s why I made my suggestions.

  6. Tom says:


    As an Eagles fan, I have to disagree with your assessment of the Eagles level of play. They are, as they have been for many years, an inconsistent team, playing down to opponents they should beat (Redskins, Bengals, Bears) while beating tough opponents (Cowboys, Giants, Falcons). Aside from their brilliant defense, McNabb and the offense have been sloppy and miserable on the field (although much of that can be placed at the feet of Andy Reid’s criminal playcalling).

    DVOA is good at measuring a team’s potential, and the Eagles certainly are a team with tremendous latent strengths, and players who (when they possess the proper psychological mindset and playcalling that suits their strengths) truly scare opposing teams (i.e. Westbrook). But the Eagles constantly underperform, and they’ve been doing it for years–everyone in Philadelphia knows it and complains about it on 610 AM even when the Eagles win. At least in the case of the Eagles, brilliant performance *isn’t* the most important factor, it’s that they find a way to lose a significant number of games that they shouldn’t. Outcomes are just as important as performance efficiency.

    I think your criticism of Beatpaths based on which teams made the playoffs is slightly off the mark. As you know, because teams are divided up into conferences and divisions, oftentimes truly good teams fail to make the cut into the playoffs because spaces are reserved for teams like the Chargers or the Cardinals who managed to top otherwise miserably bad divisions. The AFC East was a good example of this, in which two solid teams (New England and the NY Jets) were kept out of the playoffs. Meanwhile, I think that the Chargers’ win over Indianapolis is widely viewed as an upset, and even Football Outsiders picked the game for Indy. Their post-wildcard round DVOA numbers confirmed that Indy had better DVOA overall, on offense, and on defense, failing to outperform the Chargers only on special teams.

    However, I do think you’re right that favoring more recent data over older data may improve rankings and picks. I like that Football Outsiders gives less weight to older data, and I proposed a loop-breaking variant a while back that would break the oldest links in loops, something I would like to investigate further.

  7. chris clark says:

    To Rick,

    The system can be sound. There is a lot of information in “binary preference rankings”. Your example about the Mets/Phillies is to the point. The goal of the system is to remove “fluke” results of the Phiilies beating the Mets but losing to teams the Mets beat (i.e. a beatloop) and thus disproving that the Phillies were better than the Mets, while still relying only on simple stats that are barroom-arguable. You can find some good examples of slight tweaks to the system at http://www.beatgraphs.com.

    And that gets a little to your second point, adding slight extra bits of information to the W-L record to get a weighted score. You’ll find a relatively trivial example of that in the weighted graphs at the above site (i.e. the score for the game is used rather than simply the W-L number). However, adding that information is very tricky because you quickly get away from what is simple enough to be barroom-arguable. Score (for and against) clearly makes the cut. Yards, time-of-possession are other possible candidates, but beyond that it clearly gets hard to achieve the kind of consensus agreement that one needs to make the system trivially obviously correct.

    And therein comes the issue with DVOA. The system takes in a lot of information, but some of its judgments are controversial. How much time has been spent over the last few years explaining why DVOA likes the Eagles when they aren’t actually winning every game? Moreover, how does the system account for those Eagles loses. I, for one, don’t consider it just “luck”, i.e. DVOA is not trivially obviously correct–it may be correct just not in a trivial obvious fashion. However, in this case, there are factors which DVOA does not account for and they do influence why teams win and lose games. And any non-trivial system is going to have that issue–it will make controversial judgments. Only simple systems like beatpaths can escape that because they don’t make subtle judgments, just simple ones–i.e. who won. You can’t argue that point–the winning team won–the loser lost. Arguing that the loser is better is inherently controversial.

    Now down to your function versus variable point. It’s not black-and-white like that. Everything you mentioned as a variable is also a function. Yards are a function of plays and plays a function of individual performance and individual performance a function of player health, motivation, style, etc. Somewhere you get down to quantum mechanics where everything is a function and we are all just waves of possibility. Something is a variable if you measure it and assign it a value. They do that with W-L records and with score. They also do that with yards and time-of-possession. However, DVOA was created in part because the standard measurement of yards wasn’t a good predictor because it valued “junk yards” too highly. However, there isn’t a standard reference for junk yards and those stats aren’t published due to the lack of such a standard. And, the reason there is no standard is because the meaning of junk yards is controversial, not everyone would accept the same definition.

    Thus, the beatpath system is an attempt to extract as much information as possible out of the least controversial measurements. The real question is whether one can rid of “surprising” assessments based on the uncontroversial measurements, e.g. the ratings of DEN and NYJ at various points in the season. I think that is possible, just with the given data and not relying on outside factors. That does not preclude more sophisticated measurements and ratings (all the way up to and beyond DVOA).

    Both the 18-1 PATs and 14-6 NYG were awesome teams, and the 19-0 PATs would have been even more awesome and yes I was rooting for that team which didn’t actually materialize, almost as much as I rooted for the 1998 19-0 Broncos which also didn’t exist. Most any system, will recognize the teams which did exist as quite outstanding.

    Is it possible to get a beatwin based system more inline with a DVOA based system? I think so. And, I don’t think that it requires “more variables”, not that more variables would hurt. Just that there are some solutions within the data itself.

Leave a Reply

Your email address will not be published. Required fields are marked *