Learn R Programming

baseballr

baseballr is a package written for R focused on baseball analysis. It includes functions for scraping various data from websites, such as FanGraphs.com, Baseball-Reference.com, and baseballsavant.mlb.com. It also includes functions for calculating metrics, such as wOBA, FIP, and team-level consistency over custom time frames.

You can read more about some of the functions and how to use them at its official site as well as this Hardball Times article.

Installation

You can install the CRAN version of baseballr with:

install.packages("baseballr")

You can install the released version of baseballr from GitHub with:

# You can install using the pacman package using the following code:
if (!requireNamespace('pacman', quietly = TRUE)){
  install.packages('pacman')
}
pacman::p_load_current_gh("BillPetti/baseballr")
# Alternatively, using the devtools package:
if (!requireNamespace('devtools', quietly = TRUE)){
  install.packages('devtools')
}
devtools::install_github(repo = "BillPetti/baseballr")

For experimental functions in development, you can install the development branch:

# install.packages("devtools")
devtools::install_github("BillPetti/baseballr", ref = "development_branch")

Functionality

The package consists of two main sets of functions: data acquisition and metric calculation.

For example, if you want to see the standings for a specific MLB division on a given date, you can use the bref_standings_on_date() function. Just pass the year, month, day, and division you want:

library(baseballr)
library(dplyr)
bref_standings_on_date("2015-08-01", "NL East", from = FALSE)
## ── MLB Standings on Date data from baseball-reference.com ─── baseballr 1.5.0 ──

## ℹ Data updated: 2023-12-25 02:24:44 EST

## # A tibble: 5 × 8
##   Tm        W     L `W-L%` GB       RS    RA `pythW-L%`
##   <chr> <int> <int>  <dbl> <chr> <int> <int>      <dbl>
## 1 WSN      54    48  0.529 --      422   391      0.535
## 2 NYM      54    50  0.519 1.0     368   373      0.494
## 3 ATL      46    58  0.442 9.0     379   449      0.423
## 4 MIA      42    62  0.404 13.0    370   408      0.455
## 5 PHI      41    64  0.39  14.5    386   511      0.374

Right now the function works as far as back as 1994, which is when both leagues split into three divisions.

You can also pull data for all hitters over a specific date range. Here are the results for all hitters from August 1st through October 3rd during the 2015 season:

data <- bref_daily_batter("2015-08-01", "2015-10-03") 
data %>%
  dplyr::glimpse()
## Rows: 764
## Columns: 30
## $ bbref_id <chr> "machama01", "duffyma01", "altuvjo01", "eatonad02", "choosh01…
## $ season   <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2…
## $ Name     <chr> "Manny Machado", "Matt Duffy", "José Altuve", "Adam Eaton", "…
## $ Age      <dbl> 22, 24, 25, 26, 32, 21, 27, 28, 36, 28, 29, 29, 27, 29, 27, 2…
## $ Level    <chr> "Maj-AL", "Maj-NL", "Maj-AL", "Maj-AL", "Maj-AL", "Maj-AL", "…
## $ Team     <chr> "Baltimore", "San Francisco", "Houston", "Chicago", "Texas", …
## $ G        <dbl> 59, 59, 57, 58, 58, 58, 59, 58, 59, 57, 55, 57, 57, 58, 56, 5…
## $ PA       <dbl> 266, 264, 262, 262, 260, 259, 259, 258, 257, 257, 255, 255, 2…
## $ AB       <dbl> 237, 248, 244, 230, 211, 224, 239, 235, 231, 233, 213, 218, 2…
## $ R        <dbl> 36, 33, 30, 37, 48, 35, 32, 29, 37, 27, 50, 37, 36, 25, 38, 4…
## $ H        <dbl> 66, 71, 81, 74, 71, 79, 54, 66, 75, 48, 65, 56, 61, 51, 78, 5…
## $ X1B      <dbl> 43, 54, 53, 56, 47, 51, 34, 37, 48, 30, 34, 32, 35, 33, 66, 2…
## $ X2B      <dbl> 10, 12, 19, 12, 14, 17, 6, 17, 16, 11, 13, 13, 15, 10, 7, 13,…
## $ X3B      <dbl> 0, 2, 3, 1, 1, 4, 1, 0, 2, 1, 2, 4, 0, 1, 3, 0, 4, 0, 1, 1, 0…
## $ HR       <dbl> 13, 3, 6, 5, 9, 7, 13, 12, 9, 6, 16, 7, 11, 7, 2, 20, 9, 8, 8…
## $ RBI      <dbl> 32, 30, 18, 31, 34, 32, 27, 40, 53, 21, 50, 19, 31, 39, 23, 4…
## $ BB       <dbl> 26, 15, 10, 23, 39, 18, 16, 17, 21, 21, 34, 33, 21, 39, 12, 3…
## $ IBB      <dbl> 1, 0, 1, 1, 1, 0, 0, 6, 1, 1, 0, 1, 1, 5, 0, 4, 3, 3, 7, 2, 2…
## $ uBB      <dbl> 25, 15, 9, 22, 38, 18, 16, 11, 20, 20, 34, 32, 20, 34, 12, 35…
## $ SO       <dbl> 42, 35, 28, 55, 51, 38, 68, 56, 29, 53, 46, 62, 41, 48, 27, 7…
## $ HBP      <dbl> 2, 0, 4, 5, 8, 1, 3, 5, 1, 1, 2, 3, 3, 1, 1, 6, 1, 3, 4, 1, 0…
## $ SH       <dbl> 0, 0, 1, 2, 1, 11, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, …
## $ SF       <dbl> 1, 1, 3, 2, 1, 5, 1, 1, 4, 2, 5, 1, 2, 2, 3, 0, 3, 2, 3, 4, 3…
## $ GDP      <dbl> 5, 9, 6, 1, 1, 4, 2, 2, 9, 7, 5, 1, 4, 8, 1, 2, 3, 10, 5, 4, …
## $ SB       <dbl> 6, 8, 11, 9, 2, 10, 0, 0, 0, 3, 3, 4, 5, 4, 24, 2, 1, 0, 6, 0…
## $ CS       <dbl> 4, 0, 4, 4, 0, 2, 0, 0, 0, 1, 0, 1, 3, 2, 7, 2, 3, 0, 2, 0, 0…
## $ BA       <dbl> 0.279, 0.286, 0.332, 0.322, 0.337, 0.353, 0.226, 0.281, 0.325…
## $ OBP      <dbl> 0.353, 0.326, 0.364, 0.392, 0.456, 0.395, 0.282, 0.341, 0.377…
## $ SLG      <dbl> 0.485, 0.387, 0.508, 0.448, 0.540, 0.558, 0.423, 0.506, 0.528…
## $ OPS      <dbl> 0.839, 0.713, 0.872, 0.840, 0.996, 0.953, 0.705, 0.848, 0.906…

In terms of metric calculation, the package allows the user to calculate the consistency of team scoring and run prevention for any year using team_consistency():

team_consistency(2015)
## # A tibble: 30 × 5
##    Team  Con_R Con_RA Con_R_Ptile Con_RA_Ptile
##    <chr> <dbl>  <dbl>       <dbl>        <dbl>
##  1 ARI    0.37   0.36          17           15
##  2 ATL    0.41   0.4           88           63
##  3 BAL    0.4    0.38          70           42
##  4 BOS    0.39   0.4           52           63
##  5 CHC    0.38   0.41          30           85
##  6 CHW    0.39   0.4           52           63
##  7 CIN    0.41   0.36          88           15
##  8 CLE    0.41   0.4           88           63
##  9 COL    0.35   0.34           7            3
## 10 DET    0.39   0.38          52           42
## # ℹ 20 more rows

You can also calculate wOBA per plate appearance and wOBA on contact for any set of data over any date range, provided you have the data available.

Simply pass the proper data frame to woba_plus:

data %>%
  dplyr::filter(PA > 200) %>%
  woba_plus %>%
  dplyr::arrange(desc(wOBA)) %>%
  dplyr::select(Name, Team, season, PA, wOBA, wOBA_CON) %>%
  dplyr::glimpse()
## Rows: 117
## Columns: 6
## $ Name     <chr> "Edwin Encarnación", "Bryce Harper", "David Ortiz", "Joey Vot…
## $ Team     <chr> "Toronto", "Washington", "Boston", "Cincinnati", "Baltimore",…
## $ season   <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2…
## $ PA       <dbl> 216, 248, 213, 251, 253, 260, 245, 255, 223, 241, 223, 259, 2…
## $ wOBA     <dbl> 0.490, 0.450, 0.449, 0.445, 0.434, 0.430, 0.430, 0.422, 0.410…
## $ wOBA_CON <dbl> 0.555, 0.529, 0.541, 0.543, 0.617, 0.495, 0.481, 0.494, 0.459…

You can also generate these wOBA-based stats, as well as FIP, for pitchers using the fip_plus() function:

bref_daily_pitcher("2015-04-05", "2015-04-30") %>% 
  fip_plus() %>% 
  dplyr::select(season, Name, IP, ERA, SO, uBB, HBP, HR, FIP, wOBA_against, wOBA_CON_against) %>%
  dplyr::arrange(dplyr::desc(IP)) %>% 
  head(10)
## ── MLB Daily Pitcher data from baseball-reference.com ─────── baseballr 1.5.0 ──

## ℹ Data updated: 2023-12-25 02:27:52 EST

## # A tibble: 10 × 11
##    season Name               IP   ERA    SO   uBB   HBP    HR   FIP wOBA_against
##     <int> <chr>           <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>        <dbl>
##  1   2015 Johnny Cueto     37    1.95    38     4     2     3  2.62        0.21 
##  2   2015 Dallas Keuchel   37    0.73    22    11     0     0  2.84        0.169
##  3   2015 Sonny Gray       36.1  1.98    25     6     1     1  2.69        0.218
##  4   2015 Mike Leake       35.2  3.03    25     7     0     5  4.16        0.24 
##  5   2015 Félix Hernández  34.2  1.82    36     6     3     1  2.2         0.225
##  6   2015 Corey Kluber     34    4.24    36     5     2     2  2.4         0.295
##  7   2015 Jake Odorizzi    33.2  2.41    26     8     1     0  2.38        0.213
##  8   2015 Josh Collmenter  32.2  2.76    16     3     0     1  2.82        0.29 
##  9   2015 Bartolo Colón    32.2  3.31    25     1     0     4  3.29        0.28 
## 10   2015 Zack Greinke     32.2  1.93    27     7     1     2  3.01        0.24 
## # ℹ 1 more variable: wOBA_CON_against <dbl>

Issues

Please leave any suggestions or bugs in the Issues section.

Pull Requests

Pull request are welcome, but I cannot guarantee that they will be accepted or accepted quickly. Please make all pull requests to the development branch for review.

Breaking Changes

Full News on Releases

Follow the SportsDataverse (@SportsDataverse) on Twitter and star this repo

Our Authors

  • Bill Petti (@BillPetti)

  • Saiem Gilani (@saiemgilani)

Our Contributors (they’re awesome)

  • Ben Baumer (@BaumerBen)

  • Ben Dilday (@BenDilday)

  • Robert Frey (@RobertFrey40)

  • Camden Kay (@k_camden)

Citations

To cite the baseballr R package in publications, use:

BibTex Citation

@misc{petti_gilani_2021,
  author = {Bill Petti and Saiem Gilani},
  title = {baseballr: The SportsDataverse's R Package for Baseball Data.},
  url = {https://billpetti.github.io/baseballr/},
  year = {2021}
}

Copy Link

Version

Install

install.packages('baseballr')

Monthly Downloads

1,502

Version

1.6.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

January 16th, 2024

Functions in baseballr (1.6.0)

column_structure_draft_mlb

Column structure of the MLB Draft data
edge_frequency

Edge Percentage Frequency
edge_code

Edge Code
daily_pitcher_bref

(legacy) Scrape Pitcher Performance Data Over a Custom Time Frame
daily_batter_bref

(legacy) Scrape Batter Performance Data Over a Custom Time Frame
csv_from_url

Load .csv / .csv.gz file from a remote connection
chadwick

Chadwick Bureau Register Player Lookup
playername_lookup

Look up Baseball Player Name by ID
fg_guts

Scrape FanGraphs.com Guts!
chadwick_player_lu

Download the Chadwick Bureau's public register of baseball players
fangraphs

FanGraphs Functions Overview
fg_milb_batter_game_logs

Scrape MiLB game logs for batters from FanGraphs
fg_bat_leaders

(legacy) Scrape Batter Leaderboards from FanGraphs
fg_batter_game_logs

Scrape Batter Game Logs from FanGraphs
fg_batter_leaders

Scrape Batter Leaderboards from FanGraphs
fg_park

Scrape Park Factors from FanGraphs
fg_milb_pitcher_game_logs

Scrape MiLB game logs for pitchers from FanGraphs
fg_pitcher_game_logs

Scrape Pitcher Game Logs from FanGraphs
fg_pitch_leaders

(legacy) Scrape Pitcher Leaderboards from FanGraphs
fg_fielder_leaders

Scrape Fielder Leaderboards from FanGraphs
get_batting_orders

(legacy) Retrieve batting orders for a given MLB game
fip_plus

Calculate FIP and related metrics for any set of data
get_draft_mlb

(legacy) Retrieve draft pick information by year
fg_team_fielder

Scrape Team Fielder Leaderboards from FanGraphs
fg_team_pitcher

Scrape Team Pitcher Leaderboards from FanGraphs
get_game_info_sup_petti

(legacy) Download a data frame of supplemental data about MLB games since 2008.
get_probables_mlb

(legacy) Retrieve probable starters for a given MLB game
get_pbp_mlb

(legacy) Acquire pitch-by-pitch data for Major and Minor League games
get_game_info_mlb

(legacy) Retrieve additional game information for major and minor league games
get_ncaa_game_logs

(legacy) Get NCAA Baseball Game Logs
get_retrosheet_data

(legacy) Get, Parse, and Format Retrosheet Event and Roster Files
get_ncaa_lineups

(legacy) Retrieve lineups for a given NCAA game via its game_info_url
get_ncaa_park_factor

(legacy) Get Park Effects for NCAA Baseball Teams
get_game_pks_mlb

(legacy) Get MLB Game Info by Date and Level
metrics

Metrics Functions Overview
load_umpire_ids

Download a data frame of all umpires and their mlbamids for games since 2008
fg_pitcher_leaders

Scrape Pitcher Leaderboards from FanGraphs
ggspraychart

Generate spray charts with ggplot2
get_umpire_ids_petti

(legacy) Download a data frame of all umpires and their MLBAM IDs for games since 2008
label_statcast_imputed_data

Label Statcast data as imputed
fg_team_batter

Scrape Team Batter Leaderboards from FanGraphs
get_ncaa_schedule_info

(legacy) Get Schedule and Results for NCAA Baseball Teams
load_ncaa_baseball_pbp

Load cleaned NCAA baseball play-by-play data from the baseballr data repo
milb_batter_game_logs_fg

(legacy) Scrape MiLB game logs for batters from FanGraphs
milb_pitcher_game_logs_fg

(legacy) Scrape MiLB game logs for pitchers from FanGraphs
mlb_all_star_ballots

Find MLB All-Star Ballots
mlb

MLB Functions Overview
mlb_event_types

MLB Event Types
mlb_draft_prospects

Retrieve draft prospect information by year
mlb_draft

Retrieve draft pick information by year
load_game_info_sup

Download a data frame of supplemental data about MLB games since 2008.
linear_weights_savant

Generate linear weight values for events using Baseball Savant data
mlb_conferences

View all PCL conferences
mlb_awards

MLB Awards
mlb_divisions

MLB Divisions
get_ncaa_baseball_pbp

(legacy) Get Play-By-Play Data for NCAA Baseball Games
load_ncaa_baseball_season_ids

Load cleaned NCAA men's college baseball season IDs from the baseballr data repo
ncaa_baseball_roster

(legacy) Get NCAA Baseball Rosters
mlb_game_content

Retrieve additional game content for major and minor league games
load_ncaa_baseball_teams

Load cleaned NCAA men's college baseball teams from the baseballr data repo
mlb_game_changes

Acquire time codes for Major and Minor League games
mlb_fielder_detail_types

MLB Fielder Detail Types
mlb_all_star_write_ins

Find MLB All-Star Write-ins
mlb_all_star_final_vote

Find MLB All-Star Final Vote
mlb_game_context_metrics

Acquire game context metrics for Major and Minor League games
mlb_homerun_derby_bracket

Retrieve Homerun Derby Bracket
mlb_awards_recipient

MLB Award Recipients
mlb_homerun_derby_players

Retrieve Homerun Derby Players
mlb_baseball_stats

MLB Baseball Stats
mlb_batting_orders

Retrieve batting orders for a given MLB game
mlb_game_types

MLB Game Types
mlb_draft_latest

Retrieve latest draft information by year
mlb_game_linescore

Retrieve game linescores for major and minor league games
mlb_game_info

Retrieve additional game information for major and minor league games
mlb_people

Find Biographical Information for MLB Players
load_ncaa_baseball_schedule

Load cleaned NCAA baseball schedule from the baseballr data repo
mlb_attendance

MLB Attendance
mlb_league

MLB Leagues
mlb_award

MLB All-Star, Awards, Home Run Derby Functions
mlb_game_wp

Acquire win probability for Major and Minor League games
mlb_game_status_codes

MLB Game Status Codes
mlb_high_low_stats

Acquire high/low stats for Major and Minor Leagues
mlb_high_low_types

MLB Stat High/Low Types
mlb_positions

MLB Positions
mlb_player_status_codes

MLB Player Status Codes
mlb_people_free_agents

Find Information About MLB Free Agents
mlb_schedule_postseason

Find game_pk values for professional baseball postseason games (major and minor leagues)
mlb_schedule_postseason_series

Find game_pk values for professional baseball postseason series games (major and minor leagues)
mlb_game_timecodes

Acquire time codes for Major and Minor League games
mlb_jobs_datacasters

MLB Jobs Datacasters
mlb_jobs_official_scorers

MLB Jobs Official Scorers
mlb_game_pace

Retrieve game pace metrics for major and minor league
mlb_game_pks

Get MLB Game Info by Date and Level
mlb_league_leader_types

MLB League Leader Types
mlb_situation_codes

MLB Situation Codes
mlb_player_game_stats_current

Find MLB Player Game Stats - Current Game
mlb_player_game_stats

Find MLB Player Game Stats
mlb_probables

Retrieve probable starters for a given MLB game
mlb_review_reasons

MLB Review Reasons
mlb_hit_trajectories

MLB Hit Trajectories
mlb_sky

MLB Sky (Weather) Codes
mlb_sports_players

MLB Sport Players
mlb_teams_stats

MLB Teams Stats
mlb_logical_events

MLB Logical Events
mlb_metrics

MLB Metrics
mlb_roster_types

MLB Roster Types
mlb_schedule_event_types

MLB Schedule Event Types
mlb_homerun_derby

Retrieve Homerun Derby data
mlb_languages

MLB API Language Options
mlb_rosters

Find MLB Rosters by Roster Type
mlb_jobs_umpires

MLB Jobs Umpires
mlb_schedule_games_tied

Find game_pk values for professional baseball games (major and minor leagues) that are tied
mlb_stats_leaders

MLB Stats Leaders
mlb_team_affiliates

MLB Team Affiliates
ncaa_lineups

Retrieve lineups for a given NCAA game via its game_info_url
ncaa_park_factor

Get Park Effects for NCAA Baseball Teams
ncaa_teams

Scrape NCAA baseball Teams (Division I, II, and III)
mlb_venues

Find MLB Venues
mlb_teams

MLB Teams
mlb_team_stats

MLB Team Individual Stats
mlb_standings

MLB Standings
pitcher_game_logs_fg

(legacy) Scrape Pitcher Game Logs from FanGraphs
mlb_teams_stats_leaders

MLB Teams Stats Leaders
ncaa_team_player_stats

Scrape NCAA baseball Team Player Stats (Division I, II, and III)
ncaa_game_logs

Get NCAA Baseball Game Logs
ncaa_scrape

(legacy) Scrape NCAA baseball Team Player Stats (Division I, II, and III)
ncaa

NCAA Functions Overview
mlb_wind_direction_codes

MLB Wind Direction Codes
rds_from_url

Load .rds file from a remote connection
mlb_jobs

MLB Jobs
mlb_job_types

MLB Job Types
mlb_pbp

Acquire pitch-by-pitch data for Major and Minor League games
statcast

Statcast Functions Overview
mlb_pbp_diff

Acquire pitch-by-pitch data between two timecodes for Major and Minor League games
team_consistency

Calculate Team-level Consistency
request_with_proxy

Retry http request with proxy
mlb_pitch_types

MLB Pitch Types
statcast_search

Query Statcast by Date Range and Players
statcast_leaderboards

Query Baseball Savant Leaderboards
statcast_impute

Statcast Label Imputation
mlb_pitch_codes

MLB Pitch Codes
team_results_bref

(legacy) Scrape Team Results
mlb_seasons

Find MLB Seasons
mlb_seasons_all

Find MLB Seasons all
mlb_runner_detail_types

MLB Runner Detail Types
mlb_standings_types

MLB Standings Types
mlb_stat_groups

MLB Stat Groups
mlb_stat_types

MLB Stat Types
mlb_schedule

Find game_pk values for professional baseball games (major and minor leagues)
scrape_savant_leaderboards

(legacy) Query Baseball Savant Leaderboards
mlb_team_history

MLB Teams History
school_id_lu

(legacy) Lookup NCAA baseball school IDs (Division I, II, and III)
mlb_sports

MLB Sport IDs
mlb_sports_info

MLB Sport IDs Information
mlb_team_personnel

MLB Team Personnel
mlb_team_leaders

MLB Team Leaders
mlb_team_info

MLB Team Info
ncaa_schedule_info

Get Schedule and Results for NCAA Baseball Teams
statline_from_statcast

Create stat lines from Statcast data
mlb_stats

MLB Stats
stats_api_live_empty_df

Column structure of MLB Stats Live Game API data frame
most_recent_ncaa_baseball_season

Most Recent NCAA Baseball Season
most_recent_mlb_season

Most Recent MLB Season
ncaa_school_id_lu

Lookup NCAA baseball school IDs (Division I, II, and III)
sptrc_team_active_payroll

Scrape Team Active Payroll Breakdown from Spotrac
standings_on_date_bref

(legacy) Scrape MLB Standings on a Given Date
sptrc_league_payrolls

Scrape League Payroll Breakdowns from Spotrac
scrape_statcast_savant

(legacy) Query Statcast by Date Range and Players
mlb_team_alumni

MLB Team Alumni
mlb_team_coaches

MLB Team Coaches
teams_lu_table

A Team Lookup Table
woba_plus

Calculate wOBA and related metrics for any set of data
ncaa_pbp

Get Play-By-Play Data for NCAA Baseball Games
process_statcast_payload

Process Baseball Savant CSV payload
ncaa_roster

Get NCAA Baseball Rosters
progressively

Progressively
run_expectancy_code

Generate run expectancy and related measures from Baseball Savant data
retrosheet_data

Get, Parse, and Format Retrosheet Event and Roster Files
bref_standings_on_date

Scrape MLB Standings on a Given Date
chadwick_path

Check Chadwick installation
playerid_lookup

Look up Baseball Player IDs by Player Name
batter_game_logs_fg

(legacy) Scrape Batter Game Logs from FanGraphs
bref_team_results

Scrape Team Results
bref

Baseball Reference Functions Overview
bref_daily_pitcher

Scrape Pitcher Performance Data Over a Custom Time Frame
bref_daily_batter

Scrape Batter Performance Data Over a Custom Time Frame
baseballr-package

baseballr: Acquiring and Analyzing Baseball Data
code_barrel

Helper for determining whether a batted ball is a "barrel"