Learn R Programming

mStats (version 3.2.2)

regress: Linear Regression Model

Description

regress() produces regression outputs which mirror outputs from STATA.

Usage

regress(data, y, ..., robust = FALSE, plot = FALSE, rnd = 3)

Arguments

data

Dataset

y

Dependent variable

...

Independent variable or multiple variables

robust

if TRUE, robust standard errors are calculated. It is used when heteroskedasticity is detected in data. Otherwise, OLS standard errors are estimated.

plot

logical: produces plots for model assumption

rnd

specify rounding of numbers. See round.

Value

A list of three data.frame and model

Details

regress is based on lm. All statistics presented in the function's output are derivates of lm, except AIC value which is obtained from AIC.

Outputs

Outputs can be divided into three parts.

  1. Information about the model

Here provides number of observations (Obs.), F value, p-value from F test, R Squared value, Adjusted R Squared value, square root of mean square error (Root MSE) and AIC value.

  1. Errors

Outputs from anova(model) is tabulated here. SS, DF and MS indicate sum of square of errors, degree of freedom and mean of square of errors.

  1. Regression Output

Coefficients from summary of model are tabulated here along with 95\ confidence interval.

using Robust Standard Errors

if heteroskedasticity is present in our data sample, the ordinary least square (OLS) estimator will remain unbiased and consistent, but not efficient. The estimated OLS standard errors will be biased and cannot be solved with a larger sample size. To remedy this, robust standard erros can be used to adjusted standard errors.

$$Variance of Robust = (N / N - K) (X'X)^(-1) \sum{Xi X'i ei^2} (X'X)^(-1)$$

where N = number of observations, and K = the number of regressors (including the intercept). This returns a Variance-covariance (VCV) matrix where the diagonal elements are the estimated heteroskedasticity-robust coefficient variances <U+2014> the ones of interest. Estimated coefficient standard errors are the square root of these diagonal elements.

Note: Credits to Kevin Goulding, The Tarzan Blog.

Examples

Run this code
# NOT RUN {
## use airquality dataset
data(airquality)
codebook(airquality)

summ(airquality)


## linear model for Ozone
regress(airquality, Ozone, Wind)

## run again with robust standard errors and with plots to check assumption
regress(airquality, Ozone, Wind, robust = TRUE, plot = TRUE)

## linear model with multiple predictors
regress(airquality, Ozone, Wind, Solar.R, Temp, Month, Day,
        robust = TRUE, plot = TRUE)


# }

Run the code above in your browser using DataLab