I’ve managed to fire up my old wonky backtesting system. It doesn’t have a lot of ability yet, but it gives some interesting information. Here’s what I’ve found.
| Year | Random | uNet | Prev. Week | Beatloop Str. | Fractional | BPower/Win/Loop | Str. of Beatwins | Bucklin | UPower | UPower/Loop | UNet-Lookahead |
| 2004 (84-57) | 150-117 | 166-101 | 157-110 | 160-107 | 164-103 | 167-100 | 166-101 | 161-106 | 164-103 | 161-106 | 157-110 |
| 2005 (91-45) | 162-105 | 152-115 | 152-115 | 162-105 | 156-111 | 160-107 | 151-116 | 156-111 | 156-111 | 157-110 | 152-115 |
| 2006 (94-52) | 156-111 | 163-104 | 159-108 | 161-106 | 163-104 | 167-100 | 164-103 | 164-103 | 163-104 | 161-106 | 159-108 |
| 2007 (89-43) | 162-105 | 164-103 | 169-98 | 162-105 | 165-102 | 167-100 | 158-109 | 166-101 | 165-102 | 171-96 | 169-98 |
| 2008 (80-48) | 158-108 | 159-107 | 160-106 | 156-110 | 163-103 | 159-107 | 160-106 | 160-106 | 163-103 | 161-105 | 160-106 |
| TOTAL: 64.13% | 59.03% | 60.22% | 59.70% | 60.00% | 60.75% | 61.42% | 59.85% | 60.45% | 60.75% | 60.75% | 59.70% |
So, there’s some food for thought. Here’s how to read it – the records in parentheses are the actual beatpaths records, meaning, the games that have actual beatpath relationships. All the other methods in the table are tiebreaker methods; meaning ways to rank teams that don’t have beatpath relationships. All are based off of the same beatpath graph, and so all start from the records in the left column. We wouldn’t expect those methods to have a higher win percentage than the beatpath win percentage. These are various tiebreaker methods that I have tried out in the past. I haven’t tried out Weighted or Moose’s other ranking system – I’d expect Weighted to not perform very well, and the other ranking system to perform very well. This is also the vanilla method of finding beatpath relationships – not beatflukes or iterative.
I am curious what the historical win percentage of Isaacson-Tarbell is, at least over the same five-year period. This page indiciates Isaacson-Tarbell’s win percentage long-term is 62.29%, which would indicate that perhaps the best strategy to follow is 1) pick the team with a beatpath relationship to another team, 2) If there isn’t one, pick the team with the winning record, 3) if they’re tied, pick either the home team or the one selected by the beatpaths tiebreaker.
Hopefully in the future I’ll be able to look at how the beatpaths percentage (64.13%) compares to beatflukes, iterative, or whatever other kind of beatloop resolution scheme we come up with.
Well, at least everything is performing better than random. Is the percentage at the bottom of the first column saying that picks where there was an actual beatpaths relationship were 64.13% correct? That’s a pretty good figure, but it unfortunately only a applies to a limited number of games.
Why were there only 266 picks in 2008, opposed to 267 in every other season? Was it due to the Philadelphia/Cincinnati tie game?
Yeah, that’s right – the whole attempt is to order the DAG into an order that is as accurate as the DAG. The 64.13% is for the DAG.
I think that’s right about the 266 – we treat ties the same as if they hadn’t been played.
The other thing that is odd is that I have a note in my code for the best-performing tiebreaker:
# This one doesn’t really even make sense because a bad team with
# one beatwin could be ranked ahead of a good team with no beatwins.
And yet, it’s clearly better than the other ones. I’ve never used this tiebreaker in seasons, but I’m tempted to try now.
I’m not clear on what all of those tiebreakers mean — could you provide definitions or links to same?
How does “random” possibly have a 60% winning percentage? Shouldn’t it be a coin flip, by definition?
The tiebreakers only apply to teams that don’t have beatpath relationships with each other. For instance, at the beginning of the rankings process, there might be five teams that have no beatlosses. Only one of the five can be ranked #1. Random picks randomly between those five. Since the beatpath relationship picks have a 64% accuracy, that’s why random is about 60% and not 50%.
Had a configuration error in my backtesting script – it turns out that all these numbers are for if beatflukes are turned on. That means that more teams are reflected in the beatpaths graph than they would be otherwise. I’ll re-run as “vanilla” and post some updated numbers.
Sheesh, one more error – what’s interesting is that each of these season’s numbers are based off of graphs from *all* the game outcomes in the dataset so far.
So, 2006′s numbers is actually based off of a beatfluke graph that comprises all the games from 2003-2006.
Sorry.
I’ll have more accurate results the next rundown.
I think I’ve managed to code in support for MOOSE’s tie-breaking procedure – ins and outs – and using this set of assumptions (five years of team data, which is admittedly not usually what we’re measuring), I get 59.22% .