Learn R Programming

StatsBombR

By: StatsBomb

Support: support@statsbombservices.com

Updated March 22, 2020.

This repository is an R package to easily stream StatsBomb data into R using your log in credentials for the API or free data from our GitHub page. API access is for paying customers only

This package offers a parallel option to most computationally expensive functions. However, it is currently only designed for Windows.

Installation Instructions:

  1. Please first make sure you are on version 3.6.2 or later of R before attempting to install
  2. If not yet installed into R, run: install.packages("devtools")
  3. Also run:

install.packages("remotes") and then remotes::install_version("SDMTools", "1.1-221") 4. Then, install this R package as: devtools::install_github("statsbomb/StatsBombR") 5. Finally, library(StatsBombR)

This package depends on several other packages in order for all functions to run. Therefore, if you have problems with any functions or with installing the package, it is likely due to package dependencies.

Free Data

Free Data Instructions:

Welcome to the Free Data Offerings from StatsBomb Services.

This package is reading in the open access dat found on https://github.com/statsbomb/open-data. Below you will find a list of the functions used to quickly read in all open data currently available. Check back often as new data is regularly added.

Free Data Description and Privacy Policy

StatsBomb are committed to sharing new data and research publicly to enhance understanding of the game of Football. We want to actively encourage new research and analysis at all levels. Therefore we have made certain leagues of StatsBomb Data freely available for public use for research projects and genuine interest in football analytics.

StatsBomb are hoping that by making data freely available, we will extend the wider football analytics community and attract new talent to the industry. We would like to collect some basic personal information about users of our data. By giving us your email address, it means we will let you know when we make more data, tutorials and research available. We will store the information in accordance with our Privacy Policy and the GDPR.

Whilst we are keen to share data and facilitate research, we also urge you to be responsible with the data. Please register your details on https://www.statsbomb.com/resource-centre and read our User Agreement carefully.

Terms & Conditions

By using this repository, you are agreeing to the user agreement.

If you publish, share or distribute any research, analysis or insights based on this data, please state the data source as StatsBomb and use our logo.

To read in all free events available:

StatsBombData <- StatsBombFreeEvents()

To read in all of the free competitions we offer simply run:

FreeCompetitions()

or, for use in other functions, store it as a data frame object:

Comp <- FreeCompetitions()

To read in the free matches available:

Matches <- FreeMatches(Comp)

To read in free events for a certain game:

get.matchFree(Matches[1,])

It is important to note, that the argument here is the entire row returns from "FreeMatches", this is because there is information from each match observation that is needed in the get.matchFree function.

API Data

API Access Instructions:

API access is for paying customers only

To read in the competitions available through StatsBomb, run:

  1. competitions <- competitions(username, password)

To read in the matches available in each competition, run:

  1. matches <- get.matches(username, password, season_id, competition_id)

To read in all of the matches for various competitions.

  1. Pull Competitions From the API: comps <- competitions(username, password)
  2. Filter for the competitions you want: EuropeComps <- comps %>% filter(country_name == "Europe")
  3. Create a matrix of the competition and season ids: competitionmatrix <- as.matrix(EuropeComps[,1:2])
  4. Pull all of the matches: Matches <- MultiCompMatches(username, password, competitionmatrix)

To read in events for one game, simply run:

  1. StatsBombData <- get.events(username, password, match_id)

Note: A previous version of this function was named get.match(), get.match() is now deprecated).

To read in events for multiple games, run:

  1. Create a vector of match IDs:matchids <- matchesvector(username, password, season_id, competition_id)
  2. StatsBombData <- allevents(username, password, matchids)

Note: See documentation for additional parameters available to access different API versions, run in parallel or not, choose a specific number of cores. (A previous version of this function was named allmatches(), allmatches() is now deprecated).

To read in all of the events for various competitions.

  1. Pull Competitions From the API: comps <- competitions(username, password)
  2. Filter for the competitions you want: EuropeComps <- comps %>% filter(country_name == "Europe")
  3. Create a matrix of the competition and season ids: competitionmatrix <- as.matrix(EuropeComps[,1:2])
  4. Pull all of the events: Events <- MultiCompEvents(username, password, competitionmatrix)

To read in the lineups for one game, run:

  1. lineups <- get.lineups(username, password, match_id)

To read in multiple lineups, run:

  1. matchids <- matchesvector(username, password, season_id, competition_id)
  2. StatsBombLineups <- allineups(username, password, matchids, parallel = T)

To unnest all of the lineups:

StatsBombLineups <- cleanlineups(StatsBombLineups)

Data Cleaning Helpers:

Although JSON files can often be a pain to clean, especially due to nested data frames, these helper functions may make your data wrangling much easier.

To clean all of the data at once:

StatsBombData <- allclean(StatsBombData)

This function cleans the data in one line of code by running each of the functions below sequentially.

To clean all of the location variables simply run:

StatsBombData <- cleanlocations(StatsBombData)

Please note all location variables must be present in the data set. This function will not work with a subset of variables (i.e. if any location variables are missing).

To add the goalkeeper information from the freeze frame:

StatsBombData <- goalkeeperinfo(StatsBombData)

Please note that additional information is located under type.name == "Goal Keeper" and within the Freeze Frames.

To add additional shot information:

StatsBombData <- shotinfo(StatsBombData)

To extract some information from the freeze frame:

StatsBombData <- freezeframeinfo(StatsBombData)

Description of these variables:

  • Density is calculated as the aggregated inverse distance for each defender behind the ball.
  • Density in the cone is the density filtered for only defenders who are in the cone between the shooter, and each goal post.

To format the elapsed time from the start of a match:

StatsBombData <- formatelapsedtime(StatsBombData)

To add in information about the current possession within a match:

StatsBombData <- possessioninfo(StatsBombData)

Final Notes:

  • Some of the cleaning functions above depend on variables created in the functions presented before them. In order to be safe, please clean your data in the order that is presented in this document.
  • Please re-install frequently, as new functions and bug fixes will be added regularly.
  • As always, check out the Rdocumentation for each function (ex. ?StatsBombFreeEvents()) for more specific description.
  • Please contact support@statsbombservices.com with bugs and suggestions.

Copy Link

Version

Version

0.1.0

License

3.5.0

Maintainer

Derrick Yam

Last Published

September 8th, 2021

Functions in StatsBombR (0.1.0)

allmatches

This function returns all events from the matches specified in the arguments.
StatsBombFreeLineups

This function returns all lineups for the matches requested, or all matches available. This is for use with the freely released data from StatsBomb.com.
StatsBombFreeEvents

This function returns all events for the matches requested, or all matches available. This is for use with the freely released data from StatsBomb.com.
get.matches

This function returns all matches from the specified season and competition.
defensiveinfo

This function returns the defensive information from the shot freeze frame variable.
get.minutesplayed

This function returns the players from each match, the number of minutes played and the times of their substitution, if applicable.
cleanlineups

This function unnests all lineups and returns a cleaned data frame.
cleanlocations

This function cleans all of the location variables in a StatsBomb data frame.
MultiCompMatches

This function returns all matches from the competitions and seasons specified in the arguments.
get.lineupsFree

This function returns all lineups for the match requested. This is for use with the freely released data from StatsBomb.com.
MultiCompEvents

This function returns all events from the competitions and seasons specified in the arguments.
get.matchFree

This function returns all events for the match requested. This is for use with the freely released data from StatsBomb.com.
JoinPlayerNickName

This function returns joins player nick names from the lineups API into the event data.
get.playerfootedness

This function returns defines a player's footedness based on the percent of time they use each foot.
FreeCompetitions

This function returns all competitions that are released as free data from StatsBomb.com.
formatelapsedtime

This function uses the timestamp and period information to create an elapsed time from the beginning of the match variable.
freezeframeinfo

This function returns the defensive information from the shot freeze frame variable.
FreeMatches

This function returns all free matches that are released as free data from StatsBomb.com.
alllineups

This function returns all lineups from the matches specified in the arguments.
annotate_pitchSB

This function returns field markings in ggplot. This is adapted from the ggsoccer package to fit StatsBomb Data.
getpositioncategory

This function returns the players' id and 1 of 6 position categories based on when they play in each most often.
matchesvector

This function returns all matches from the specified season and competition.
get.opposingteam

This function joins the opposing team in each match to all events in the data frame.
goalkeeperinfo

This function returns the goalkeeper information from the shot freeze frame variable.
getmatch

This function returns all events for the match specified in the arguments.
possessioninfo

Returns a data frame with all of the original data frame all cleaned for R with extra information added.
shotinfo

This function returns extra shot information.
theme_SB

This function returns the theme consistent with reports written by StatsBomb.
getgamestate

This function returns additional variables for the game state, score and opposing score. It also returns a dataframe of the time each team spent winning.
get.lineups

This function returns all lineups for the match specified in the arguments.
allclean

Returns a data frame with all of the original data frame all cleaned for R with extra information added.
competitions

This function returns all competitions that StatsBomb has data for.
team_season

This function returns aggregated metrics from the IQ API for all teams in a given league-season.
player_match

This function returns metrics from the IQ API for all players in a given match.
player_all_matches

This function returns match-by-match metrics from the IQ API for all players in a given league-season. Different to player_season, which pulls per 90 aggregated metrics.
player_season

This function returns aggregated metrics from the IQ API for all players in a given league-season.
FreeJoinPlayerNickname

This function returns and joins player nicknames from the free lineups API into the event data.
player_match_multicomp

This function returns match-by-match metrics from the IQ API for all players across multiple leagues and/or seasons.