Learn R Programming

imputeTestbench (version 3.0.3)

impute_errors: Function working as testbench for comparison of imputing models

Description

Function working as testbench for comparison of imputing models

Usage

impute_errors(dataIn, smps = "mcar", methods = c("na.approx",
  "na.interp", "na_interpolation", "na.locf", "na_mean"),
  methodPath = NULL, errorParameter = "rmse", errorPath = NULL,
  blck = 50, blckper = TRUE, missPercentFrom = 10,
  missPercentTo = 90, interval = 10, repetition = 10,
  addl_arg = NULL)

Arguments

dataIn

input ts for testing

smps

chr string indicating sampling type for generating missing data, see details

methods

chr string of imputation methods to use, one to many. A user-supplied function can be included if MethodPath is used, see details.

methodPath

chr string of location of script containing one or more functions for the proposed imputation method(s)

errorParameter

chr string indicating which error type to use, acceptable values are "rmse" (default), "mae", or "mape". Alternatively, a user-supplied function can be passed if errorPath is used, see details.

errorPath

chr string of location of script containing one or more error functions for evaluating imputations

blck

numeric indicating block sizes as a percentage of the sample size for the missing data, applies only if smps = 'mar'

blckper

logical indicating if the value passed to blck is a percentage of the sample size for missing data, otherwise blck indicates number of observations

missPercentFrom

numeric from which percent of missing values to be considered

missPercentTo

numeric for up to what percent missing values are to be considered

interval

numeric for interval between consecutive missPercent values

repetition

numeric for repetitions to be done for each missPercent value

addl_arg

arguments passed to other imputation methods as a list of lists, see details.

Value

Returns an error comparison for imputation methods as an errprof object. This object is structured as a list where the first two elements are named Parameter and MissingPercent that describe the error metric used to assess the imputation methods and the intervals of missing observations as percentages, respectively. The remaining elements are named as the chr strings in methods of the original function call. Each remaining element contains a numeric vector of the average error at each missing percent of observations. The errprof object also includes an attribute named errall as an additional list that contains all of the error estimates for every imputation method and repetition.

Details

The default methods for impute_errors are na.approx, na.interp, na_interpolation, na.locf, and na_mean. See the help file for each for additional documentation. Additional arguments for the imputation functions are passed as a list of lists to the addl_arg argument, where the list contains one to many elements that are named by the methods. The elements of the master list are lists with arguments for the relevant methods. See the examples.

A user-supplied function can also be passed to methods as an additional imputation method. A character string indicating the path of the function must also be supplied to methodPath. The path must point to a function where the first argument is the time series to impute.

An alternative error function can also be passed to errorParameter if errorPath is not NULL. The function specified in errorPath must have two arguments where the first is a vector for the observed time series and the second is a vector for the predicted time series.

The smps argument indicates the type of sampling for generating missing data. Options are smps = 'mcar' for missing completely at random and smps = 'mar' for missing at random. Additional information about the sampling method is described in sample_dat. The relevant arguments for smps = 'mar' are blck and blckper which greatly affect the sampling method.

Infinite comparisons are removed with a warning if errorParameter = 'mape'. This occurs if any of the observed values in the original time series are zero. Error estimates for such datasets are evaluated only for non-zero observations.

See Also

sample_dat

Examples

Run this code
# NOT RUN {
# default options
aa <- impute_errors(dataIn = nottem)
aa
plot_errors(aa)

# change the simulation for missing obs
aa <- impute_errors(dataIn = nottem, smps = 'mar')
aa
plot_errors(aa)

# use one interpolation method, increase repetitions
aa <- impute_errors(dataIn = nottem, methods = 'na.interp', repetition = 100)
aa
plot_errors(aa)

# change the error metric
aa <- impute_errors(dataIn = nottem, errorParameter = 'mae')
aa
plot_errors(aa)

# passing additional arguments to imputation methods
impute_errors(dataIn = nottem, addl_arg = list(na_mean = list(option = 'mode')))
# }

Run the code above in your browser using DataLab