cvdm: Cross-Validated Difference in Means (CVDM) Test

Description

Applies cross-validated log-likelihood difference in means test to compare two methods of estimating a formula. The output identifies the more appropriate model.

In choosing between OLS and MR, please cite:

Harden, J. J., & Desmarais, B. A. (2011). Linear Models with Outliers: Choosing between Conditional-Mean and Conditional-Median Methods. State Politics & Policy Quarterly, 11(4), 371-389. 10.1177/1532440011408929

For other applications of the CVDM test, please cite:

Desmarais, B. A., & Harden, J. J. (2014). An Unbiased Model Comparison Test Using Cross-Validation. Quality & Quantity, 48(4), 2155-2173. 10.1007/s11135-013-9884-7

Usage

cvdm(
  formula,
  data,
  method1 = c("OLS", "MR", "RLM", "RLM-MM"),
  method2 = c("OLS", "MR", "RLM", "RLM-MM"),
  subset,
  na.action,
  ...
)

Arguments

formula

A formula object, with the dependent variable on the left of a ~ operator, and the independent variables on the right.

data

A data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model.

method1

A method to estimate the model. Currently takes Ordinary Least Squares ("OLS"), Median Regression ("MR"), Robust Linear Regression ("RLM") using M-estimation, and Robust Linear Regression using MM-estimation ("RLM-MM"). The algorithm method used to compute the fit for the median regression is the modified version of the Barrodale and Roberts algorithm for l1-regression, which is the rq default by R package quantreg. See quantreg rq function documentation for more details. Fitting for the robust regressions is done by iterated re-weighted least squares (IWLS) and is taken from the MASS package rlm function. The MM-estimation is the M-estimation with Tukey's biweight initialized by a specific S-estimate. The M-estimation, which can be achieved in this package with the option "RLM", is the default for the MASS rlm function. See MASS package rlm documentation for details.

method2

A method to estimate the model. Options are same as for method1.

subset

Expression indicating which subset of the rows of data should be used in the fit. All observations are included by default.

na.action

A missing-data filter function, applied to the model.frame, after any subset argument has been used.

...

Optional arguments, currently unsupported.

Value

An object of class cvdm computed by the cross-validated log likelihood difference in means test (CVDM). The object is the Cross-Validated Johnson's t-test. A positive test statistic supports the first method and a negative test statistic supports the second. See cvdm_object for more details.

Details

This function implements the cross-validated difference in means (CVDM) test between two methods of estimating a formula. The function takes a formula and two methods and computes a vector of cross-validated log- likelihoods (CVLLs) for each method using the leave-one-out method. These output test score is the cross-validated Johnson's t-test. A positive test statistic supports the first method and a negative test statistic supports the second. Singular matrices during the leave-one-out cross-validation process are skipped.

References

Harden, J. J., & Desmarais, B. A. (2011). Linear Models with Outliers: Choosing between Conditional-Mean and Conditional-Median Methods. State Politics & Policy Quarterly, 11(4), 371-389. 10.1177/1532440011408929
Desmarais, B. A., & Harden, J. J. (2014). An Unbiased Model Comparison Test Using Cross-Validation. Quality & Quantity, 48(4), 2155-2173. 10.1007/s11135-013-9884-7

Examples

Run this code

# NOT RUN {
  set.seed(123456)
  b0 <- .2 # True value for the intercept
  b1 <- .5 # True value for the slope
  n <- 500 # Sample size
  X <- runif(n, -1, 1)

  Y <- b0 + b1 * X + rnorm(n, 0, 1) # N(0, 1 error)

  obj_cvdm <- cvdm(Y ~ X, data.frame(cbind(Y, X)), method1 = "OLS", method2 = "MR")
# }

Run the code above in your browser using DataLab