I’ve managed to fire up my old wonky backtesting system. It doesn’t have a lot of ability yet, but it gives some interesting information. Here’s what I’ve found.

Year Random uNet Prev. Week Beatloop Str. Fractional BPower/Win/Loop Str. of Beatwins Bucklin UPower UPower/Loop UNet-Lookahead
2004 (84-57) 150-117 166-101 157-110 160-107 164-103 167-100 166-101 161-106 164-103 161-106 157-110
2005 (91-45) 162-105 152-115 152-115 162-105 156-111 160-107 151-116 156-111 156-111 157-110 152-115
2006 (94-52) 156-111 163-104 159-108 161-106 163-104 167-100 164-103 164-103 163-104 161-106 159-108
2007 (89-43) 162-105 164-103 169-98 162-105 165-102 167-100 158-109 166-101 165-102 171-96 169-98
2008 (80-48) 158-108 159-107 160-106 156-110 163-103 159-107 160-106 160-106 163-103 161-105 160-106
TOTAL: 64.13% 59.03% 60.22% 59.70% 60.00% 60.75% 61.42% 59.85% 60.45% 60.75% 60.75% 59.70%

So, there’s some food for thought. Here’s how to read it – the records in parentheses are the actual beatpaths records, meaning, the games that have actual beatpath relationships. All the other methods in the table are tiebreaker methods; meaning ways to rank teams that don’t have beatpath relationships. All are based off of the same beatpath graph, and so all start from the records in the left column. We wouldn’t expect those methods to have a higher win percentage than the beatpath win percentage. These are various tiebreaker methods that I have tried out in the past. I haven’t tried out Weighted or Moose’s other ranking system – I’d expect Weighted to not perform very well, and the other ranking system to perform very well. This is also the vanilla method of finding beatpath relationships – not beatflukes or iterative.

I am curious what the historical win percentage of Isaacson-Tarbell is, at least over the same five-year period. This page indiciates Isaacson-Tarbell’s win percentage long-term is 62.29%, which would indicate that perhaps the best strategy to follow is 1) pick the team with a beatpath relationship to another team, 2) If there isn’t one, pick the team with the winning record, 3) if they’re tied, pick either the home team or the one selected by the beatpaths tiebreaker.

Hopefully in the future I’ll be able to look at how the beatpaths percentage (64.13%) compares to beatflukes, iterative, or whatever other kind of beatloop resolution scheme we come up with.

8 Responses to Backtesting

  1. JT says:

    Well, at least everything is performing better than random. Is the percentage at the bottom of the first column saying that picks where there was an actual beatpaths relationship were 64.13% correct? That’s a pretty good figure, but it unfortunately only a applies to a limited number of games.

    Why were there only 266 picks in 2008, opposed to 267 in every other season? Was it due to the Philadelphia/Cincinnati tie game?

  2. ThunderThumbs says:

    Yeah, that’s right – the whole attempt is to order the DAG into an order that is as accurate as the DAG. The 64.13% is for the DAG.

    I think that’s right about the 266 – we treat ties the same as if they hadn’t been played.

  3. ThunderThumbs says:

    The other thing that is odd is that I have a note in my code for the best-performing tiebreaker:

    # This one doesn’t really even make sense because a bad team with
    # one beatwin could be ranked ahead of a good team with no beatwins.

    And yet, it’s clearly better than the other ones. I’ve never used this tiebreaker in seasons, but I’m tempted to try now.

  4. the silent speaker says:

    I’m not clear on what all of those tiebreakers mean — could you provide definitions or links to same?

    How does “random” possibly have a 60% winning percentage? Shouldn’t it be a coin flip, by definition?

  5. ThunderThumbs says:

    The tiebreakers only apply to teams that don’t have beatpath relationships with each other. For instance, at the beginning of the rankings process, there might be five teams that have no beatlosses. Only one of the five can be ranked #1. Random picks randomly between those five. Since the beatpath relationship picks have a 64% accuracy, that’s why random is about 60% and not 50%.

  6. ThunderThumbs says:

    Had a configuration error in my backtesting script – it turns out that all these numbers are for if beatflukes are turned on. That means that more teams are reflected in the beatpaths graph than they would be otherwise. I’ll re-run as “vanilla” and post some updated numbers.

  7. ThunderThumbs says:

    Sheesh, one more error – what’s interesting is that each of these season’s numbers are based off of graphs from *all* the game outcomes in the dataset so far.

    So, 2006’s numbers is actually based off of a beatfluke graph that comprises all the games from 2003-2006.

    Sorry. 🙂 I’ll have more accurate results the next rundown.

  8. ThunderThumbs says:

    I think I’ve managed to code in support for MOOSE’s tie-breaking procedure – ins and outs – and using this set of assumptions (five years of team data, which is admittedly not usually what we’re measuring), I get 59.22% .

Leave a Reply

Your email address will not be published. Required fields are marked *