Learn R Programming

⚠️There's a newer version (1.8.3) of this package.Take me there.

Introducing the nflscrapR Package

This package was built to allow R users to utilize and analyze data from the National Football League (NFL) API. The functions in this package allow users to perform analysis at the play and game levels on single games and entire seasons. By parsing the play-by-play data recorded by the NFL, this package allows NFL data enthusiasts to examine each facet of the game at a more insightful level. The creation of this package puts granular data into the hands of the any R user with an interest in performing analysis and digging up insights about the game of American Football. With open-source data, the development of reproducible advanced NFL metrics can occur at a more rapid pace and lead to growing the football analytics community.

Note: Data is only available after 2009

Downloading and Loading the Package

# Must install the devtools package using the below commented out code
# install.packages('devtools')
library(devtools)

devtools::install_github(repo = "maksimhorowitz/nflscrapR")
#> Skipping install for github remote, the SHA1 (05815ef8) has not changed since last install.
#>   Use `force = TRUE` to force installation

# Load the package

library(nflscrapR)

Simple Example of Package Usage

Here is an example of comparing the difference in the distributions of EPA per attempt for passers with at least 50 attempts between NFL seasons from 2009-2016. The code for this example is below:

# Loading the data with season_play_by_play function: (Note the
# season_play_by_play function takes a few minutes to run)

pbp_2009 <- season_play_by_play(2009)
pbp_2010 <- season_play_by_play(2010)
pbp_2011 <- season_play_by_play(2011)
pbp_2012 <- season_play_by_play(2012)
pbp_2013 <- season_play_by_play(2013)
pbp_2014 <- season_play_by_play(2014)
pbp_2015 <- season_play_by_play(2015)
pbp_2016 <- season_play_by_play(2016)

# Stack the datasets together: (Load the tidyverse first - as if you didn't
# already...)

library(tidyverse)
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> complete(): tidyr, RCurl
#> filter():   dplyr, stats
#> lag():      dplyr, stats

pbp_data <- bind_rows(pbp_2009, pbp_2010, pbp_2011, pbp_2012, pbp_2013, pbp_2014, 
    pbp_2015, pbp_2015)

# Now filter down to only passing attempts, group by the season and passer,
# then calculate the number of passing attempts, total expected points added
# (EPA), EPA per attempt, then finally filter to only those with at least 50
# pass attempts:

passing_stats <- pbp_data %>% filter(PassAttempt == 1 & PlayType != "No Play" & 
    !is.na(Passer)) %>% group_by(Season, Passer) %>% summarise(Attempts = n(), 
    Total_EPA = sum(EPA, na.rm = TRUE), EPA_per_Att = Total_EPA/Attempts) %>% 
    filter(Attempts >= 50)

# Using the ggjoy package (install with the commented out code below) can
# compare the EPA per Pass Attempt for each NFL season:
library(ggplot2)
# install.packages('ggjoy')
library(ggjoy)

ggplot(passing_stats, aes(x = EPA_per_Att, y = as.factor(Season))) + geom_joy(scale = 3, 
    rel_min_height = 0.01) + theme_joy() + ylab("Season") + xlab("EPA per Pass Attempt") + 
    scale_y_discrete(expand = c(0.01, 0)) + scale_x_continuous(expand = c(0.01, 
    0)) + ggtitle("The Shifting Distribution of EPA per Pass Attempt") + theme(plot.title = element_text(hjust = 0.5, 
    size = 16), axis.title = element_text(size = 16), axis.text = element_text(size = 16))
#> Picking joint bandwidth of 0.0603

Copy Link

Version

Version

1.4.0

License

CC0

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

April 3rd, 2020

Functions in nflscrapR (1.4.0)

drive_summary

Drive Summary and Results
nflteams

Dataset of NFL team names, abbreviations, and colors
game_play_by_play

Parsed Descriptive Play-by-Play Dataset for a Single Game
playerstats11

NFL Team Names and Abbreviations
playerstats15

NFL Team Names and Abbreviations
proper_jsonurl_formatting

Formatting URL for location of NFL Game JSON Data
playerstats13

NFL Team Names and Abbreviations
playerstats14

NFL Team Names and Abbreviations
season_rosters

Season Rosters for Teams
season_player_game

Boxscore for Each Game in the Season - One line per player per game
season_play_by_play

Parsed Descriptive Play-by-Play Function for a Full Season
season_games

Game Information for All Games in a Season
simple_boxscore

Simple Game Boxscore
playerstats09

NFL Team Names and Abbreviations
agg_player_season

Detailed Player Aggregate Season Statistics
playerstats10

NFL Team Names and Abbreviations
player_game

Detailed Boxscore for Single NFL Game
extracting_gameids

Extract GameIDs for each game in a given NFL season
playerstats12

NFL Team Names and Abbreviations
expected_points

Expected point function to calculate expected points for each play in the play by play, and the expected points added in three ways, basic EPA, air yards EPA, and yards after catch EPA
win_probability

Win probability function to add win probability columns for the home and away teams for each play in the game
buildURL

Building URL to scrape player season stat pages
getPageNumbers

Get Number of Player Position Pages
getPlayers

Scrape Player Names and Positions
buildNameAbbr

Build formatted player name from full player name
getGSISID

For a player's href, get their GSIS ID from their personal url.
findPagePlayerID

Find the GSIS ID for each player on the provided page.