Learn R Programming

BaseballDBR

Install

  • The latest development version from GitHub:
devtools::install_github("keberwein/baseballDBR")

Gathering Data

The baseballDBR package requires data that is formatted similar to the Baseball Databank or Sean Lahman's Baseball Database. The package also contains the get_bbdb() function, which allows us to download the most up-to-date tables directly from the Chadwick Bureau's GitHub repository. For example, we can easily load the "Batting" table into our R environment.

library(baseballDBR)

get_bbdb(table = "Batting")
head(Batting)
#>    playerID yearID stint teamID lgID  G  AB  R  H X2B X3B HR RBI SB CS BB
#> 1 abercda01   1871     1    TRO <NA>  1   4  0  0   0   0  0   0  0  0  0
#> 2  addybo01   1871     1    RC1 <NA> 25 118 30 32   6   0  0  13  8  1  4
#> 3 allisar01   1871     1    CL1 <NA> 29 137 28 40   4   5  0  19  3  1  2
#> 4 allisdo01   1871     1    WS3 <NA> 27 133 28 44  10   2  2  27  1  1  0
#> 5 ansonca01   1871     1    RC1 <NA> 25 120 29 39  11   3  0  16  6  2  2
#> 6 armstbo01   1871     1    FW1 <NA> 12  49  9 11   2   1  0   5  0  1  0
#>   SO IBB HBP SH SF GIDP
#> 1  0  NA  NA NA NA   NA
#> 2  0  NA  NA NA NA   NA
#> 3  5  NA  NA NA NA   NA
#> 4  2  NA  NA NA NA   NA
#> 5  1  NA  NA NA NA   NA
#> 6  1  NA  NA NA NA   NA

Use with the Lahman Package

library(Lahman)
library(baseballDBR)

Batting <- Lahman::Batting
head(Batting)
#>    playerID yearID stint teamID lgID  G  AB  R  H X2B X3B HR RBI SB CS BB
#> 1 abercda01   1871     1    TRO   NA  1   4  0  0   0   0  0   0  0  0  0
#> 2  addybo01   1871     1    RC1   NA 25 118 30 32   6   0  0  13  8  1  4
#> 3 allisar01   1871     1    CL1   NA 29 137 28 40   4   5  0  19  3  1  2
#> 4 allisdo01   1871     1    WS3   NA 27 133 28 44  10   2  2  27  1  1  0
#> 5 ansonca01   1871     1    RC1   NA 25 120 29 39  11   3  0  16  6  2  2
#> 6 armstbo01   1871     1    FW1   NA 12  49  9 11   2   1  0   5  0  1  0
#>   SO IBB HBP SH SF GIDP
#> 1  0  NA  NA NA NA   NA
#> 2  0  NA  NA NA NA   NA
#> 3  5  NA  NA NA NA   NA
#> 4  2  NA  NA NA NA   NA
#> 5  1  NA  NA NA NA   NA
#> 6  1  NA  NA NA NA   NA

Adding Basic Metrics

Simple batting metrics can be easily added to any batting data frame. For example, we can add slugging percentage, on-base percentage and on-base plus slugging. Note that OPS and OBP appears as "NA" for the years before IBB was tracked.

library(baseballDBR)

Batting$SLG <- SLG(Batting)

Batting$OBP <- OBP(Batting)

head(Batting, 3)
#>    playerID yearID stint teamID lgID  G  AB  R  H X2B X3B HR RBI SB CS BB
#> 1 abercda01   1871     1    TRO   NA  1   4  0  0   0   0  0   0  0  0  0
#> 2  addybo01   1871     1    RC1   NA 25 118 30 32   6   0  0  13  8  1  4
#> 3 allisar01   1871     1    CL1   NA 29 137 28 40   4   5  0  19  3  1  2
#>   SO IBB HBP SH SF GIDP   SLG OBP
#> 1  0  NA  NA NA NA   NA 0.000  NA
#> 2  0  NA  NA NA NA   NA 0.322  NA
#> 3  5  NA  NA NA NA   NA 0.394  NA

Advanced Metrics

The package includes a suite of advanced metrics such as wOBA, RAA, and FIP, among others. Many of the advanced metrics require multiple tables. For example, the wOBA metric requires the Batting, Pitching, and Fielding tables in order to establish a player's regular defensive position.

library(baseballDBR)

get_bbdb(table = c("Batting", "Pitching", "Fielding"))

Batting$wOBA <- wOBA(Batting, Pitching, Fielding, Fangraphs = T)
head(Batting, 3)
#>    playerID yearID stint teamID lgID  G  AB  R  H X2B X3B HR RBI SB CS BB
#> 1 abercda01   1871     1    TRO <NA>  1   4  0  0   0   0  0   0  0  0  0
#> 2  addybo01   1871     1    RC1 <NA> 25 118 30 32   6   0  0  13  8  1  4
#> 3 allisar01   1871     1    CL1 <NA> 29 137 28 40   4   5  0  19  3  1  2
#>   SO IBB HBP SH SF GIDP      wOBA
#> 1  0  NA  NA NA NA   NA 0.0000000
#> 2  0  NA  NA NA NA   NA 0.2855902
#> 3  5  NA  NA NA NA   NA 0.3078849

The code above uses Fangraphs wOBA values. The default behavior is to uses Tom Tango's adapted SQL formula. Other options include Sep.Leagues, which may act as a buffer to any bias created by the designated hitter.

library(baseballDBR)

get_bbdb(table = c("Batting", "Pitching", "Fielding"))

Batting$wOBA <- wOBA(Batting, Pitching, Fielding, Fangraphs = F, Sep.Leagues = T)
head(Batting, 3)
#>    playerID yearID stint teamID lgID  G  AB  R  H X2B X3B HR RBI SB CS BB
#> 1 abercda01   1871     1    TRO <NA>  1   4  0  0   0   0  0   0  0  0  0
#> 2  addybo01   1871     1    RC1 <NA> 25 118 30 32   6   0  0  13  8  1  4
#> 3 allisar01   1871     1    CL1 <NA> 29 137 28 40   4   5  0  19  3  1  2
#>   SO IBB HBP SH SF GIDP wOBA
#> 1  0  NA  NA NA NA   NA   NA
#> 2  0  NA  NA NA NA   NA   NA
#> 3  5  NA  NA NA NA   NA   NA

We can also produce a data frame that only shows the wOBA multipliers. Notice the Fangraphs wOBA multipliers slightly differ from the Tango multipliers.

library(baseballDBR)

get_bbdb(table = c("Batting", "Pitching", "Fielding"))

fangraphs_woba <- wOBA_values(Batting, Pitching, Fielding, Fangraphs=T)
head(fangraphs_woba, 3)
#>   yearID lg_woba woba_scale   wBB  wHBP   w1B   w2B   w3B   wHR runSB
#> 1   2017   0.319      1.197 0.692 0.722 0.878 1.237 1.560 1.994   0.2
#> 2   2016   0.318      1.212 0.691 0.721 0.878 1.242 1.569 2.015   0.2
#> 3   2015   0.313      1.251 0.687 0.718 0.881 1.256 1.594 2.065   0.2
#>    runCS lg_r_pa lg_r_w  cFIP
#> 1 -0.419   0.121  9.967 3.109
#> 2 -0.410   0.118  9.778 3.147
#> 3 -0.392   0.113  9.421 3.134

tango_woba <- wOBA_values(Batting, Pitching, Fielding, Fangraphs=F)
head(tango_woba, 3)
#> # A tibble: 3 x 35
#> # Groups:   yearID, RperOut, runBB, runHBP, run1B, run2B, run3B, runHR,
#> #   runSB, runCS [3]
#>   yearID    AB     R     H   X2B   X3B    HR    SB    CS    BB    SO   IBB
#>    <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <int> <dbl> <dbl>
#> 1   1871 23179  5659  6616   950   495   101   948   270   817   371     0
#> 2   1872 34755  7487 10003  1212   293    88   536   264   477   532     0
#> 3   1873 40346  8487 11832  1308   472   102   395   253   747   552     0
#> # ... with 23 more variables: HBP <dbl>, SF <dbl>, RperOut <dbl>,
#> #   runBB <dbl>, runHBP <dbl>, run1B <dbl>, run2B <dbl>, run3B <dbl>,
#> #   runHR <dbl>, runSB <dbl>, runCS <dbl>, runMinus <dbl>, runPlus <dbl>,
#> #   lg_woba <dbl>, woba_scale <dbl>, wBB <dbl>, wHBP <dbl>, w1B <dbl>,
#> #   w2B <dbl>, w3B <dbl>, wHR <dbl>, wSB <dbl>, wCS <dbl>

Create Local Database

A relational database is not needed to work with these data. However, we may want to store the data to be called more quickly at a later time. We can download all of the tables at once with the get_bbdb() function and then write them to an empty schema in our favorite database. The example uses a newly created PostgreSQL instance, but other database tools can be used assuming an appropriate R package exists.

library(baseballDBR)
library(RPostgreSQL)

# Load all tables into the Global Environment.
get_bbdb(AllTables = TRUE)

# Make a list of all data frames.
dbTables <- names(Filter(isTRUE, eapply(.GlobalEnv, is.data.frame)))

# Load data base drivers and load all data frames in a loop.
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, host= "localhost", dbname= "lahman", user= "YOUR_USERNAME", password = "YOUR_PASSWORD")

for (i in 1:length(dbTables)) { 
    if (dbExistsTable(con, dbTables[i])) {
        dbRemoveTable(dbTables[i])
    }
    dbWriteTable(con, name =  dbTables[i], value = get0(dbTables[i])) 
}

# Disconnect from database.
dbDisconnect(con)
rm(con, drv)

Copy Link

Version

Install

install.packages('baseballDBR')

Monthly Downloads

25

Version

0.1.2

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Kris Eberwein

Last Published

June 15th, 2017

Functions in baseballDBR (0.1.2)

Batting2016

A sample subset of the Batting table from the Baseball Databank for the year 2016.
CTpct

Batting: Calculate a batter's contact rate
BA

Batting: Calculate batting average
BABIP

Batting: Calculate batting average on balls in play (BABIP)
BB_9

Pitching: Calculate walks per nine innings
BBpct

Batting: Calculate base on ball percentage
Ch

Fielding: Calculate defensive chances
FIP

Pitching: Fielding Independent Pitching (FIP)
OBP

Batting: Calculate on base percentage (OBP)
OPS

Batting: Calculate on base percentage plus slugging (OPS)
H_9

Pitching: Calculate Hits per Nine innings
ISO

Batting: Calculate ISO for batters
K_9

Pitching: Calculate Strikes per Nine innings
RCtech

Batting: Calculate Runs Created using the technical formula.
SLG

Batting: Calculate slugging percentage (SLG)
wOBA_values

Return wOBA values per season
wRAA

Batting: Calculate Weighted Runs Above Average (wRAA)
RC2002

Batting: Calculate Runs Created using the updated 2002 formula.
RCbasic

Batting: Calculate Runs Created using the basic formula.
fip_values

Return FIP constants per season
Fielding2016

A sample subset of the Fielding table from the Baseball Databank for the year 2016.
Fld_pct

Fielding: Calculate batting average
Kpct

Batting: Calculate strikeout percentage
LOB_pct

Pitching: Calculate the left on base percentage
HR_9

Pitching: Calculate Home Runs per Nine innings
HRpct

Batting: Calculate home run percentage
PA

Batting: Calculate plate appearances for batters
Pitching2016

A sample subset of the Pitching table from the Baseball Databank for the year 2016.
%>%

Pipe operator
seasonAVG

League averages and aggregates from 1901 to present.
get_bbdb

Get an up to date copy of the Baseball Databank.
wRC

Batting: Calculate Weighted Runs Created (wRC)
XBHpct

Batting: Calculate extra base percentage
XBperH

Batting: Calculate extra base per hit
urlExists

urlExists
wOBA

Batting: Calculate Weighted On-Base Average (wOBA)
TBs

Batting: Calculate a batter's total bases
WHIP

Pitching: Calculate Walks plus Hits per Innings Pitched
baseballDBR

baseballDBR: A package for working with data from the Baseball Databank/Lahman Database.
%<>%

Compound_pipe
IP

Pitching: Calculate the innings pitched