Learn R Programming

scorer

scorer is a set of tools for quickly scoring models in data science and machine learning. This toolset is written in C++, where possible, for blazing fast performance. This toolset's API follows that of sklearn.metrics as closely as possible so one can easily switch back and forth between the two languages without too much cognitive dissonance. The following types of metrics are currently implemented in scorer:

  • Regression metrics (implemented in 0.2.0)

The following types of metrics are soon to be implemented in scorer:

  • Classification metrics (to be implemented in 0.3.0)
  • Multilabel ranking metrics (to be implemented in 0.3.0)
  • Clustering metrics (to be implemented in 0.3.0)
  • Biclustering metrics (to be implemented in 0.3.0)
  • Pairwise metrics (to be implemented in 0.3.0)

Installation

You can install:

  • the latest released version from CRAN with

    install.packages("scorer")
  • the latest development version from Github with

    if (packageVersion("devtools") < 1.6) {
      install.packages("devtools")
    }
    devtools::install_github("paulhendricks/scorer")

If you encounter a clear bug, please file a minimal reproducible example on github.

News

scorer 0.2.0

Improvements

  • All functions from scorer 0.1.0 have been deprecated in favor of a new API that mirrors the API of sklearn.metrics. These functions will be removed in 1.0.0.
  • Added more functions!
  • Nearly all functions implemented in C++ for blazing fast speed!
  • Additional features such as sample weighting for some error metrics have been identified and placed on a backburner for future releases.
  • Implemented unit tests for base functions.

scorer 0.1.0

Improvements

  • Implemented several functions for estimating errors.
  • Implemented unit tests for nearly all functions.
  • First minor version release to CRAN!

Bug fixes

  • Fixed minor error in passing multiple arguments to mae().

API

Regression metrics

Load library and data

library("scorer")
packageVersion("scorer")
#> [1] '0.2.0'
data(mtcars)

Visualize data

library("ggplot2")
ggplot(mtcars, aes(x = wt, y = mpg)) + 
  geom_point() + 
  geom_smooth(method = 'lm') + 
  expand_limits(x = c(0, 6), y = c(0, 40))

Partition data into train and test sets

set.seed(1)
n_train <- floor(nrow(mtcars) * 0.60)
n_test <- nrow(mtcars) - n_train
mask <- sample(c(rep(x = TRUE, times = n_train), rep(x = FALSE, times = n_test)))
mtcars[, "Type"] <- ifelse(mask, "Train", "Test")
train_mtcars <- mtcars[mask, ]
test_mtcars <- mtcars[!mask, ]
ggplot(mtcars, aes(x = wt, y = mpg, color = Type)) + 
  geom_point() + 
  expand_limits(x = c(0, 6), y = c(0, 40))

Build a model on train data set

model <- lm(mpg ~ wt, data = train_mtcars)

Predict model using the test data set

test_mtcars[, "predicted_mpg"] <- predict(model, newdata = test_mtcars)

Score model using various metrics

scorer::mean_absolute_error(test_mtcars[, "mpg"], test_mtcars[, "predicted_mpg"])
#> [1] 3.287805
scorer::mean_squared_error(test_mtcars[, "mpg"], test_mtcars[, "predicted_mpg"])
#> [1] 15.43932

Build a final model on all the data

final_model <- lm(mpg ~ wt, data = mtcars)

Predict final model using the original data set

mtcars[, "predicted_mpg"] <- predict(final_model, newdata = mtcars)

Score final model using various metrics

scorer::explained_variance_score(mtcars[, "mpg"], mtcars[, "predicted_mpg"])
#> [1] 847.7252
scorer::unexplained_variance_score(mtcars[, "mpg"], mtcars[, "predicted_mpg"])
#> [1] 278.3219
scorer::total_variance_score(mtcars[, "mpg"], mtcars[, "predicted_mpg"])
#> [1] 1126.047
scorer::r2_score(mtcars[, "mpg"], mtcars[, "predicted_mpg"])
#> [1] 0.7528328

Classification metrics

# TO BE UPDATED

People

License

Session Information

sessionInfo()
#> R version 3.2.3 (2015-12-10)
#> Platform: x86_64-apple-darwin13.4.0 (64-bit)
#> Running under: OS X 10.11.3 (El Capitan)
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_2.0.0 scorer_0.2.0 
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_0.12.3      digest_0.6.9     plyr_1.8.3       grid_3.2.3      
#>  [5] gtable_0.1.2     formatR_1.2.1    magrittr_1.5     evaluate_0.8    
#>  [9] scales_0.3.0     stringi_1.0-1    rmarkdown_0.8.1  labeling_0.3    
#> [13] tools_3.2.3      stringr_1.0.0    munsell_0.4.2    yaml_2.1.13     
#> [17] colorspace_1.2-6 htmltools_0.2.6  knitr_1.12

Copy Link

Version

Install

install.packages('scorer')

Monthly Downloads

19

Version

0.2.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

February 1st, 2016

Functions in scorer (0.2.0)

absolute_percent_error

Calculate absolute percent error regression loss.
mean_absolute_scaled_error

Calculate mean absolute scaled error regression loss.
mean_absolute_error

Calculate mean absolute error regression loss.
median_squared_log_error

Calculate median squared log error regression loss.
r2_score

Calculate R^2 (coefficient of determination) regression score function.
percent_error

Calculate percent error regression loss.
median_percent_error

Calculate median percent error regression loss.
ae

Calculate absolute error between actual and forecast.
rmse

Calculate root mean squared error.
explained_variance_score

Calculate explained variance regression score function.
pe

Calculate percent error between actual and forecast.
mean_squared_error

Calculate mean squared error regression loss.
symmetric_mean_absolute_percent_error

Calculate symmetric mean absolute percent error regression loss.
mean_squared_log_error

Calculate mean squared log error regression loss.
e

Calculate error between actual and forecast.
ape

Calculate absolute percent error between actual and forecast.
total_variance_score

Calculate total variance regression score function.
mean_absolute_percent_error

Calculate mean absolute percent error regression loss.
median_squared_error

Calculate median squared error regression loss.
log_error

Calculate log error regression loss.
unexplained_variance_score

Calculate unexplained variance regression score function.
scorer

scorer: Quickly Score Models in Data Science and Machine Learning.
squared_error

Calculate squared error regression loss.
median_absolute_percent_error

Calculate median absolute percent error regression loss.
mean_percent_error

Calculate mean percent error regression loss.
mean_error

Calculate mean error regression loss.
absolute_error

Calculate absolute error regression loss.
median_absolute_error

Calculate median absolute error regression loss.
symmetric_median_absolute_percent_error

Calculate symmetric median absolute percent error regression loss.
mape

Calculate mean absolute percent error between actual and forecast.
squared_log_error

Calculate squared log_error regression loss.