Applies cross-validated log-likelihood difference in means test to compare two methods of estimating a formula. The output identifies the more appropriate model.
In choosing between OLS and MR, please cite:
Harden, J. J., & Desmarais, B. A. (2011). Linear Models with Outliers: Choosing between Conditional-Mean and Conditional-Median Methods. State Politics & Policy Quarterly, 11(4), 371-389. 10.1177/1532440011408929
For other applications of the CVDM test, please cite:
Desmarais, B. A., & Harden, J. J. (2014). An Unbiased Model Comparison Test Using Cross-Validation. Quality & Quantity, 48(4), 2155-2173. 10.1007/s11135-013-9884-7
cvdm(
formula,
data,
method1 = c("OLS", "MR", "RLM", "RLM-MM"),
method2 = c("OLS", "MR", "RLM", "RLM-MM"),
subset,
na.action,
...
)
A formula object, with the dependent variable on the left of a ~ operator, and the independent variables on the right.
A data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model.
A method to estimate the model. Currently takes
Ordinary Least Squares ("OLS"), Median Regression ("MR"), Robust Linear
Regression ("RLM") using M-estimation, and Robust Linear Regression using
MM-estimation ("RLM-MM"). The algorithm method used to compute the fit for the
median regression is the modified version of the Barrodale and Roberts algorithm
for l1-regression, which is the rq
default by R package quantreg.
See quantreg rq
function documentation for more details.
Fitting for the robust regressions is done by iterated re-weighted least squares
(IWLS) and is taken from the MASS package rlm
function.
The MM-estimation is the M-estimation with Tukey's biweight initialized by a specific
S-estimate. The M-estimation, which can be achieved in this package with the
option "RLM", is the default for the MASS rlm
function. See MASS package rlm
documentation for details.
A method to estimate the model. Options are same as for method1.
Expression indicating which subset of the rows of data should be used in the fit. All observations are included by default.
A missing-data filter function, applied to the model.frame, after any subset argument has been used.
Optional arguments, currently unsupported.
An object of class cvdm
computed by the cross-validated log likelihood
difference in means test (CVDM). The object is the Cross-Validated Johnson's t-test.
A positive test statistic supports the first method and a negative test statistic supports
the second. See cvdm_object
for more details.
This function implements the cross-validated difference in means (CVDM) test between two methods of estimating a formula. The function takes a formula and two methods and computes a vector of cross-validated log- likelihoods (CVLLs) for each method using the leave-one-out method. These output test score is the cross-validated Johnson's t-test. A positive test statistic supports the first method and a negative test statistic supports the second. Singular matrices during the leave-one-out cross-validation process are skipped.
Harden, J. J., & Desmarais, B. A. (2011). Linear Models with Outliers: Choosing between Conditional-Mean and Conditional-Median Methods. State Politics & Policy Quarterly, 11(4), 371-389. 10.1177/1532440011408929
Desmarais, B. A., & Harden, J. J. (2014). An Unbiased Model Comparison Test Using Cross-Validation. Quality & Quantity, 48(4), 2155-2173. 10.1007/s11135-013-9884-7
# NOT RUN {
set.seed(123456)
b0 <- .2 # True value for the intercept
b1 <- .5 # True value for the slope
n <- 500 # Sample size
X <- runif(n, -1, 1)
Y <- b0 + b1 * X + rnorm(n, 0, 1) # N(0, 1 error)
obj_cvdm <- cvdm(Y ~ X, data.frame(cbind(Y, X)), method1 = "OLS", method2 = "MR")
# }
Run the code above in your browser using DataLab