Learn R Programming

nflfastR

nflfastR is a set of functions to efficiently scrape NFL play-by-play data. nflfastR expands upon the features of nflscrapR:

  • The package contains NFL play-by-play data back to 1999
  • As suggested by the package name, it obtains games much faster
  • Includes completion probability (cp), completion percentage over expected (cpoe), and expected yards after the catch (xyac_epa and xyac_mean_yardage) in play-by-play going back to 2006
  • Includes drive information, including drive starting position and drive result
  • Includes series information, including series number and series success
  • Hosts a repository of play-by-play data going back to 1999 for very quick access
  • Features models for Expected Points, Win Probability, Completion Probability, and Yards After the Catch (see section below)
  • Includes a function update_db() that creates and updates a database

We owe a debt of gratitude to the original nflscrapR team, Maksim Horowitz, Ronald Yurko, and Samuel Ventura, without whose contributions and inspiration this package would not exist.

Installation

The easiest way to get nflfastR is to install it from CRAN with:

install.packages("nflfastR")

To get a bug fix or to use a feature from the development version, you can install the development version of nflfastR either from GitHub with:

if (!require("pak")) install.packages("pak")
pak::pak("nflverse/nflfastR")

or prebuilt from the development repo with:

install.packages("nflfastR", repos = c("https://nflverse.r-universe.dev", getOption("repos")))

Usage

We have provided some application examples in the Getting Started article. However, these require a basic knowledge of R. For this reason we have the nflfastR beginner’s guide, which we recommend to all those who are looking for an introduction to nflfastR with R.

You can find column names and descriptions in the Field Descriptions article, or by accessing the field_descriptions dataframe from the package.

Data access

Even though nflfastR is very fast, we recommend downloading the data from here or using the nflreadr package. These data sets include play-by-play data of complete seasons going back to 1999 and are updated nightly during the season. The files contain both regular season and postseason data, and one can use game_type or week to figure out which games occurred in the postseason.

nflfastR models

nflfastR uses its own models for Expected Points, Win Probability, Completion Probability, and Expected Yards After the Catch. To read about the models, please see this post on Open Source Football. For a more detailed description of the motivation for Expected Points models, we highly recommend this paper from the nflscrapR team located here.

Here is a visualization of the Expected Points model by down and yardline.

Here is a visualization of the Completion Probability model by air yards and pass direction.

nflfastR includes two win probability models: one with and one without incorporating the pre-game spread.

Special thanks

  • To Nick Shoemaker for finding and making available JSON-formatted NFL play-by-play back to 1999 (nflfastR uses this source for 1999 and 2000 and previously also used it for 2001-2010)
  • To Lau Sze Yui for developing a scraping function to access JSON-formatted NFL play-by-play beginning in 2001
  • To Aaron Schatz and FTN Fantasy for providing charting data to correctly mark scrambles in the 1999-2005 seasons
  • To Lee Sharpe for curating a resource for game information
  • To Timo Riske, Lau Sze Yui, Sean Clement, and Daniel Houston for many helpful discussions regarding the development of the new nflfastR models
  • To Zach Feldman and Josh Hermsmeyer for many helpful discussions about CPOE models as well as Peter Owen for many helpful suggestions for the CP model
  • To Florian Schmitt for the logo design
  • The many users who found and reported bugs in nflfastR 1.0
  • And of course, the original nflscrapR team, Maksim Horowitz, Ronald Yurko, and Samuel Ventura, whose work represented a dramatic step forward for the state of public NFL research

Copy Link

Version

Install

install.packages('nflfastR')

Monthly Downloads

1,275

Version

5.0.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Ben Baldwin

Last Published

November 26th, 2024

Functions in nflfastR (5.0.0)

add_qb_epa

Compute QB epa
add_xyac

Add expected yards after completion (xyac) variables
add_xpass

Add expected pass columns
build_nflfastR_pbp

Build a Complete nflfastR Data Set
calculate_player_stats_def

Get Official Game Stats on Defense
calculate_player_stats_kicking

Summarize Kicking Stats
fast_scraper_schedules

Load NFL Season Schedules
calculate_expected_points

Compute expected points
load_pbp

Load Play By Play
fast_scraper

Get NFL Play by Play Data
fast_scraper_roster

Load Team Rosters for Multiple Seasons
teams_colors_logos

NFL Team names, colors and logo urls.
update_db

Update or Create a nflfastR Play-by-Play Database
calculate_series_conversion_rates

Compute Series Conversion Information from Play by Play
calculate_player_stats

Get Official Game Stats
load_player_stats

Load Player Level Weekly Stats
field_descriptions

nflfastR Field Descriptions
clean_pbp

Clean Play by Play Data
calculate_standings

Compute Division Standings and Conference Seeds from Play by Play
nflfastR-package

nflfastR: Functions to Efficiently Access NFL Play by Play Data
report

Get a Situation Report on System, nflverse Package Versions and Dependencies
decode_player_ids

Decode the player IDs in nflfastR play-by-play data
missing_raw_pbp

Compute Missing Raw PBP Data on Local Filesystem
calculate_stats

Calculate NFL Stats
calculate_win_probability

Compute win probability
save_raw_pbp

Download Raw PBP Data to Local Filesystem
nfl_stats_variables

NFL Stats Variables
stat_ids

NFL Stat IDs and their Meanings