This dataset contains batting statistics for the 2002 baseball season. The data allows you to compute batting averages, on base percentages, and other statistics of interest to baseball fans. The data only contains players with more than 100 atbats for a team in the year. The data is excerpted with permission from the Lahman baseball database at http://www.seanlahman.com/.
data(batting)
A data frame with 438 observations on the following 22 variables.
This is coded, but those familiar with the players should be able to find their favorites.
a numeric vector. Always 2002 in this dataset.
a numeric vector. Player's stint (order of appearances within a season)
a factor with Team
a factor with levels AL
NL
number of games played
number of at bats
number of runs
number of hits
number of doubles. "2B" in original dat a base.
number of triples. "3B" in original data base
number of home runs
number of runs batted in
number of stolen bases
number of times caught stealing
number of base on balls (walks)
number of strikeouts
number of intentional walks
number of hit by pitches
number of sacrifice hits
number of sacrifice flies
number of grounded into double plays
Baseball fans are “statistics” crazy. They love to talk about things like RBIs, BAs and OBPs. In order to do so, they need the numbers. This data comes from the Lahman baseball database at http://www.seanlahman.com/. The complete dataset includes data for all of baseball not just the year 2002 presented here.
In addition to the data set above, the book Curve Ball, by Albert, J. and Bennett, J., Copernicus Books, gives an extensive statistical analysis of baseball.
See https://www.baseball-almanac.com/stats.shtml for definitions of common baseball statistics.
# NOT RUN {
data(batting)
attach(batting)
BA = H/AB # batting average
OBP = (H + BB + HBP) / (AB + BB + HBP + SF) # On base "percentage"
# }
Run the code above in your browser using DataLab