neWeight.default: Expand the dataset and calculate ratio-of-mediator probability weights

Description

This function both expands the data along hypothetical exposure values and calculates ratio-of-mediator probability weights.

Usage

# S3 method for default
neWeight(
  object,
  formula,
  data,
  nRep = 5,
  xSampling = c("quantiles", "random"),
  xFit,
  percLim = c(0.05, 0.95),
  ...
)

Value

A data frame of class c("data.frame", "expData", "weightData"). See expData for its structure.

Arguments

object: fitted model object representing the mediator model.
formula: a formula object providing a symbolic description of the mediator model. Redundant if already specified in call for fitted model specified in object (see details).
data: data, as matrix or data frame, containing the exposure (and other relevant) variables. Redundant if already specified in call for fitted model specified in object (see details).
nRep: number of replications or hypothetical values of the exposure to sample for each observation unit.
xSampling: character string indicating how to sample from the conditional exposure distribution. Possible values are "quantiles" or "random" (see details).
xFit: an optional fitted object (preferably glm) for the conditional exposure distribution (see details).
percLim: a numerical vector of the form c(lower, upper) indicating the extreme percentiles to sample when using "quantiles" as sampling method to sample from the conditional exposure distribution (see details).
...: additional arguments.

Details

The calculated weights are ratios of fitted probabilities or probability densities from the distribution of the mediator model. This model needs to be specified as a fitted object in the object argument.

If the model-fitting function used to fit the mediator model does not require specification of a formula or data argument, these need to be specified explicitly in order to enable neWeight.default to extract pointers to variable types relevant for mediation analysis.

Whether a formula is specified externally (in the call for the fitted mediator model object which is specified in object) or internally (via the formula argument), it always needs to be of the form M ~ X + C1 + C2, with predictor variables entered in the following prespecified order:

exposure X: The first predictor is coded as exposure or treatment.
baseline covariates C: All remaining predictor variables are automatically coded as baseline covariates.

It is important to adhere to this prespecified order to enable neWeight to create valid pointers to these different types of predictor variables. This requirement extends to the use of operators different than the + operator, such as the : and * operators (when e.g. adding interaction terms). For instance, the formula specifications M ~ X * C1 + C2, M ~ X + C1 + X:C1 + C2 and Y ~ X + X:C1 + C1 + C2 will create identical pointers to the different types of variables, as the order of the unique predictor variables is identical in all three specifications.

Furthermore, categorical exposures that are not coded as factors in the original dataset, should be specified as factors in the formula, using the factor function, e.g. M ~ factor(X) + C1 + C2. Quadratic or higher-order polynomial terms can be included as well, by making use of the I function or by using the poly function. For instance, M ~ X + I(X^2) + C1 + C2 and M ~ poly(X, 2, raw = TRUE) + C1 + C2 are equivalent and result in identical pointers to the different types of variables.

The command terms(object, "vartype") (with object replaced by the name of the resulting expanded dataset) can be used to check whether valid pointers have been created.

In contrast to imputation models with categorical exposures, additional arguments need to be specified if the exposure is continuous. All of these additional arguments are related to the sampling procedure for the exposure.

Whereas the number of replications nRep for categorical variables equals the number of levels for the exposure coded as a factor (i.e. the number of hypothetical exposure values), the number of desired replications needs to be specified explicitly for continuous exposures. Its default is 5.

If xFit is left unspecified, the hypothetical exposure levels are automatically sampled from a linear model for the exposure, conditional on a linear combination of all covariates. If one wishes to use another model for the exposure, this default model specification can be overruled by referring to a fitted model object in the xFit argument. Misspecification of this sampling model does not induce bias in the estimated coefficients and standard errors of the natural effect model.

The xSampling argument allows to specify how the hypothetical exposure levels should be sampled from the conditional exposure distribution (which is either entered explicitly using the xFit argument or fitted automatically as described in the previous paragraph). The "random" option randomly samples nRep draws from the exposure distribution, whereas the "quantiles" option (default) samples nRep quantiles at equal-sized probability intervals. Only the latter hence yields fixed exposure levels given nRep and xFit.

In order to guarantee that the entire support of the distribution is being sampled (which might be a concern if nRep is chosen to be small), the default lower and upper sampled quantiles are the 5th and 95th percentiles. The intermittent quantiles correspond to equal-sized probability intervals. So, for instance, if nRep = 4, then the sampled quantiles will correspond to probabilities 0.05, 0.35, 0.65 and 0.95. These default 'outer' quantiles can be changed by specifying the percLim argument accordingly. By specifying percLim = NULL, the standard quantiles will be sampled (e.g., 0.2, 0.4, 0.6 and 0.8 if nRep = 4).

Examples

Run this code

data(UPBdata)

## example using glm
fit.glm <- glm(negaff ~ att + gender + educ + age, data = UPBdata)
weightData <- neWeight(fit.glm, nRep = 2)
# \dontshow{
library(VGAM) 
fit1 <- glm(negaff ~ att + gender + educ + age, data = UPBdata)
expData1 <- neWeight(fit1)
w1 <- attr(expData1, "weights")
expData1f <- neWeight(negaff ~ att + gender + educ + age, data = UPBdata)
w1f <- attr(expData1f, "weights")
head(expData1); head(expData1f)
head(w1); head(w1f)

##

UPBdata$negaff2 <- cut(UPBdata$negaff, breaks = 2, labels = c("low", "high"))
fit2 <- glm(negaff2 ~ att + gender + educ + age, family = binomial, data = UPBdata)
expData2 <- neWeight(fit2)
w2 <- attr(expData2, "weights")
expData2f <- neWeight(negaff2 ~ att + gender + educ + age, family = binomial, data = UPBdata)
w2f <- attr(expData2f, "weights")
head(expData2); head(expData2f)
head(w2); head(w2f)

# test vglm
fit2b <- vgam(negaff2 ~ att + gender + educ + age, family = binomialff, data = UPBdata)
expData2b <- neWeight(fit2b)
head(attr(expData2, "weights")); head(attr(expData2b, "weights"))
fit2b <- vgam(negaff2 ~ s(att) + gender + educ + age, family = binomialff, data = UPBdata)
expData2b <- neWeight(fit2b)
head(attr(expData2, "weights")); head(attr(expData2b, "weights"))
expData2bf <- neWeight(negaff2 ~ s(att) + gender + educ + age, FUN = vgam, family = binomialff, data = UPBdata)
head(attr(expData2b, "weights")); head(attr(expData2bf, "weights"))
##

UPBdata$negaff3 <- cut(UPBdata$negaff, breaks = 3, labels = c("low", "moderate", "high"))
UPBdata$negaff3 <- as.numeric(UPBdata$negaff3)
fit3 <- glm(negaff3 ~ att + gender + educ + age, family = "poisson", data = UPBdata)
expData3 <- neWeight(fit3)
w3 <- attr(expData3, "weights")
expData3f <- neWeight(negaff3 ~ att + gender + educ + age, family = poisson, data = UPBdata)
w3f <- attr(expData3f, "weights")
head(expData3); head(expData3f)
head(w3); head(w3f)

# test vglm
fit3b <- vgam(negaff3 ~ att + gender + educ + s(age), family = poissonff, data = UPBdata)
expData3b <- neWeight(fit3b)
head(attr(expData3, "weights")); head(attr(expData3b, "weights"))
fit3b <- vgam(negaff3 ~ s(att) + gender + educ + age, family = poissonff, data = UPBdata)
expData3b <- neWeight(fit3b)
head(attr(expData3, "weights")); head(attr(expData3b, "weights"))
expData3bf <- neWeight(negaff3 ~ s(att) + gender + educ + age, FUN = vgam, family = poissonff, data = UPBdata)
head(attr(expData3b, "weights")); head(attr(expData3bf, "weights"))
# }