NFL Ratings - APIs

4 minute read

Published:

======

Background

In this series, I’ve been building a Python program to analyze and predict the outcomes of NFL games. To do this, I run a script each week which prompts me to record the outcome of each game the previous week. Using these outcomes, I update an Elo-based rating system. Then I run another script which prompts me to input next week’s schedule, at which point the Elo system can make automated predictions about next week’s outcomes.

This is all well and good, but if you’re like me, you rolled your eyes reading that last paragraph. It’s inefficient. All that data needs to be entered manually into a command line interface. Not very fun! So we need to automate this process somehow.

Approach

We’re going to use an API to get both the weekly schedule and the postgame results. Thankfully, ESPN has a good collection of API endpoints where we can scrape all the data we need!

For example, instead of running the script to input next week’s schedule, we can just query the API to get the entire schedule! I’ve created a function called query_game_links, which finds the current week’s schedule and generates a URL for info on each of these games. Then we just have to visit each of these one by one in order to get the game data. It looks something like this:

meta, links = query_game_links()

games = []
for link in links:
    r = requests.get(url=link)
    resp = r.json()
    
    away, home = [alias[abbrev.strip().capitalize()] for abbrev in resp['shortName'].upper().replace('VS', '@').split('@')]

In that code snippet, we only extract the names of the two teams. This request comes with a whole lot more data, and this usually includes weather data, but not for every game. Since the API response is sometimes missing this data, I don’t want to incorporate it into the system just yet until I understand the API behavior better and can account for the case that it’s missing.

In a similar way, we can get info on the past games as well. This way the user doesn’t have to input any scores in the command line (if they don’t want to). In addition, we’ll be able to easily add more statistics to the prediction model later.

r = requests.get(url=URL)
resp = r.json()

games = []
for game in resp['events']:
    home, away = None, None
    for competitor in game['competitions'][0]['competitors']:
        tm = competitor['team']['name']
        score = int(competitor['score'])
        if competitor['homeAway'] == 'home':
            home = tm
            homescore = score
        elif competitor['homeAway'] == 'away':
            away = tm
            awayscore = score
    assert home is not None and away is not None, "Incomplete game"
    games.append({"Home" : home,
                    "Home Score" : homescore, 
                    "Away" : away,
                    "Away Score" : awayscore})

That’s just about all we need to update our ratings each week! However, once again, we’ve queried a ton of data that we don’t end up using! For example, we can get all kinds of other stats on the game besides just each team’s score. We might not want to use any of these to update our simple Elo ratings, but even still, they could be very useful when it comes to offensive rating (ORTG) and defensive rating (DRTG).

Potential Improvements

  • Weather Data
    • As mentioned earlier, this is available through the API’s I currently use, but it will take some attention to incorporate it into the program.
  • Clever Off/Def ratings
    • Again, I mentioned this in the previous section, as well as the last blog post.
  • Roster information
    • One of the most common sources of discrepancy between my lines and the Vegas lines1, is that Vegas knows of recent roster changes. My model obviously does not take roster into account whatsoever. At least not at the moment.
  1. By the way, ‘Vegas’ is a catch-all term used to refer to the major sportsbooks (e.g. DraftKings, FanDuel), which tend to have the most accurate lines when it comes to sports betting.