tmleMSM: Targeted Maximum Likelihood Estimation of Parameter of MSM

Description

Targeted maximum likelihood estimation of the parameter of a marginal structural model (MSM) for binary point treatment effects. The tmleMSM function is minimally called with arguments (Y,A,W, MSM), where Y is a continuous or binary outcome variable, A is a binary treatment variable, (A=1 for treatment, A=0 for control), and W is a matrix or dataframe of baseline covariates. MSM is a valid regression formula for regressing Y on any combination of A, V, W, T, where V defines strata and T represents the time at which repeated measures on subjects are made. Missingness in the outcome is accounted for in the estimation procedure if missingness indicator Delta is 0 for some observations. Repeated measures can be identified using the id argument. Observation weigths (sampling weights) may optionally be provided

Usage

tmleMSM(Y, A, W, V, T = rep(1,length(Y)), Delta = rep(1, length(Y)), MSM, 
        v = NULL, Q = NULL, Qform = NULL, Qbounds = c(-Inf, Inf), 
        Q.SL.library = c("SL.glm", "tmle.SL.dbarts2", "SL.glmnet"), 
        cvQinit = TRUE, hAV = NULL, hAVform = NULL, g1W = NULL, 
        gform = NULL, pDelta1 = NULL, g.Deltaform = NULL, 
	g.SL.library = c("SL.glm", "tmle.SL.dbarts.k.5", "SL.gam"),
	g.Delta.SL.library = c("SL.glm", "tmle.SL.dbarts.k.5", "SL.gam"),
	ub = sqrt(sum(Delta))* log(sum(Delta)) / 5, family = "gaussian", 
	fluctuation = "logistic", alpha  = 0.995, id = 1:length(Y), 
	V.Q = 10, V.g = 10, V.Delta = 10, inference = TRUE, verbose = FALSE, 
	Q.discreteSL = FALSE, g.discreteSL = FALSE, alpha.sig = 0.05, obsWeights = NULL)

Value

psi: MSM parameter estimate
sigma: variance covariance matrix
se: standard errors extracted from sigma
pvalue: two-sided p-value
lb: lower bound on 95% confidence interval
ub: upper bound on 95% confidence interval
epsilon: fitted value of epsilon used to target initial Q
psi.Qinit: MSM parameter estimate based on untargeted initial Q
Qstar: targeted estimate of Q, an $n \times 2$ matrix with predicted values for Q(0,W), Q(1,W) using the updated fit
Qinit: initial estimate of Q. Qinit$coef are the coefficients for a glm model for Q, if applicable. Qinit$Q is an $n \times 2$ matrix, where n is the number of observations. Columns contain predicted values for Q(0,W),Q(1,W) using the initial fit. Qinit$type is method for estimating Q
g: treatment mechanism estimate. A list with three items: g$g1W contains estimates of $P(A=1|W)$ for each observation, g$coef the coefficients for the model for $g$ when glm used, g$type estimation procedure
g.AV: estimate for h(A,V) or h(A,T). A list with three items: g.AV$g1W an $n \times 2$ matrix containing values of $P(A=0|V,T), P(A=1|V,T)$ for each observation, g.AV$coef the coefficients for the model for $g$ when glm used, g.AV$type estimation procedure
g_Delta: missingness mechanism estimate. A list with three items: g_Delta$g1W an $n \times 2$ matrix containing values of $P(Delta=1|A,V,W,T)$ for each observation, g_Delta$coef the coefficients for the model for $g$ when glm used, g_Delta$type estimation procedure

Arguments

Y: continuous or binary outcome variable
A: binary treatment indicator, 1 - treatment, 0 - control
W: vector, matrix, or dataframe containing baseline covariates. Factors are not currently allowed.
V: vector, matrix, or dataframe of covariates used to define strata
T: optional time for repeated measures data
Delta: indicator of missing outcome or treatment assignment. 1 - observed, 0 - missing
MSM: MSM of interest, specified as valid right hand side of a regression formula (see examples)
v: optional value defining the strata of interest ($V=v$) for stratified estimation of MSM parameter
Q: optional $n \times 2$ matrix of initial values for $Q$ portion of the likelihood, $(E(Y|A=0,W), E(Y|A=1,W))$
Qform: optional regression formula for estimation of $E(Y|A, W)$, suitable for call to glm
Qbounds: vector of upper and lower bounds on Y and predicted values for initial Q
Q.SL.library: optional vector of prediction algorithms to use for SuperLearner estimation of initial Q
cvQinit: logical, if TRUE, estimates cross-validated predicted values using discrete super learning, default=TRUE
hAV: optional $n \times 2$ matrix used in numerator of weights for updating covariate and the influence curve. If unspecified, defaults to conditional probabilities $P(A=1|V)$ or $P(A=1|T)$, for repeated measures data. For unstabilized weights, pass in an $n \times 2$ matrix of all 1s
hAVform: optionalregression formula of the form A~V+T, if specified this overrides the call to SuperLearner
g1W: optional vector of conditional treatment assingment probabilities, $P(A=1|W)$
gform: optional regression formula of the form A~W, if specified this overrides the call to SuperLearner
pDelta1: optional $n \times 2$ matrix of conditional probabilities for missingness mechanism,$P(Delta=1|A=0,V,W,T), P(Delta=1|A=1,V,W,T)$.
g.Deltaform: optional regression formula of the form Delta~A+W, if specified this overrides the call to SuperLearner
g.SL.library: optional vector of prediction algorithms to use for SuperLearner estimation of g1W
g.Delta.SL.library: optional vector of prediction algorithms to use for SuperLearner estimation ofpDelta1
ub: upper bound on inverse probability weights. See Details section for more information
family: family specification for working regression models, generally ‘gaussian’ for continuous outcomes (default), ‘binomial’ for binary outcomes
fluctuation: ‘logistic’ (default), or ‘linear’
alpha: used to keep predicted initial values bounded away from (0,1) for logistic fluctuation
id: optional subject identifier
V.Q: number of cross-validation folds for Super Learner estimation of Q
V.g: number of cross-validation folds for Super Learner estimation of g
V.Delta: number of cross-validation folds for Super Learner estimation of g_Delta
inference: if TRUE, variance-covariance matrix, standard errors, pvalues, and 95% confidence intervals are calculated. Setting to FALSE saves a little time when bootstrapping.
verbose: status messages printed if set to TRUE (default=FALSE)
Q.discreteSL: If true, use discrete SL to estimate Q, otherwise ensembleSL by default. Ignored when SL is not used.
g.discreteSL: If true, use discrete SL to estimate each component of g, otherwise ensembleSL by default. Ignored when SL is not used.
alpha.sig: significance level for constructing 1-alpha.sig confidence intervals
obsWeights: optional weights for biased sampling and two-stage designs.

Author

Susan Gruber sgruber@cal.berkeley.edu, in collaboration with Mark van der Laan.

Details

ub bounds the IC by bounding the factor $h(A,V)/[g(A,V,W)P(Delta=1|A,V,W)]$ between 0 and ub, default value based on sample size.

References

1. Gruber, S. and van der Laan, M.J. (2012), tmle: An R Package for Targeted Maximum Likelihood Estimation. Journal of Statistical Software, 51(13), 1-35. https://www.jstatsoft.org/v51/i13/

2. Rosenblum, M. and van der Laan, M.J. (2010), Targeted Maximum Likelihood Estimation of the Parameter of a Marginal Structural Model. The International Journal of Biostatistics,6(2), 2010.

3. Gruber, S., Phillips, R.V., Lee, H., van der Laan, M.J. Data-Adaptive Selection of the Propensity Score Truncation Level for Inverse Probability Weighted and Targeted Maximum Likelihood Estimators of Marginal Point Treatment Effects. American Journal of Epidemiology 2022; 191(9), 1640-1651.

Examples

Run this code

library(tmle)
# Example 1. Estimating MSM parameter with correctly specified regression formulas
# MSM: psi0 + psi1*A + psi2*V + psi3*A*V  (saturated)
# true parameter value: psi = (0, 1, -2, 0.5) 
# generate data
  set.seed(100)
  n <- 1000
  W <- matrix(rnorm(n*3), ncol = 3)
  colnames(W) <- c("W1", "W2", "W3")
  V <- rbinom(n, 1, 0.5)
  A <- rbinom(n, 1, 0.5)
  Y <- rbinom(n, 1, plogis(A - 2*V + 0.5*A*V))
  result.ex1 <- tmleMSM(Y, A, W, V, MSM = "A*V", Qform = "Y~.", gform = "A~1", 
                        hAVform = "A~1", family = "binomial")
  print(result.ex1)
if (FALSE) {

# Example 2. Biased sampling from example 1 population
# (observations having V = 1 twice as likely to be included in the dataset
  retain.ex2 <- sample(1:n, size = n/2, p = c(1/3 + 1/3*V))
  wt.ex2 <- 1/(1/3 + 1/3*V)
  result.ex2 <- tmleMSM(Y[retain.ex2], A[retain.ex2], W[retain.ex2,], 
			V[retain.ex2], MSM = "A*V", Qform = "Y~.", gform = "A~1", 
                        hAVform = "A~1", family = "binomial",
			obsWeight = wt.ex2[retain.ex2])
  print(result.ex2)

# Example 3. Repeated measures data, two observations per id
# (e.g., crossover study design)
# MSM: psi0 + psi1*A + psi2*V + psi3*V^2 + psi4*T
# true parameter value: psi = (-2, 1, 0, -2, 0 )
# generate data in wide format (id,  W1, Y(t),  W2(t), V(t), A(t)) 
   set.seed(10)
   n <- 250
   id <- rep(1:n)
   W1   <- rbinom(n, 1, 0.5)
   W2.1 <- rnorm(n)
   W2.2 <- rnorm(n)  
   V.1   <- rnorm(n)  
   V.2   <- rnorm(n)
   A.1 <- rbinom(n, 1, plogis(0.5 + 0.3 * W2.1))
   A.2 <- 1-A.1
   Y.1  <- -2 + A.1 - 2*V.1^2 + W2.1 + rnorm(n)
   Y.2  <- -2 + A.2 - 2*V.2^2 + W2.2 + rnorm(n)
   d <- data.frame(id, W1, W2=W2.1, W2.2, V=V.1, V.2, A=A.1, A.2, Y=Y.1, Y.2)

# change dataset from wide to long format
   longd <- reshape(d, 
          varying = cbind(c(3, 5, 7, 9), c(4, 6, 8, 10)),
          idvar = "id",
          direction = "long",
          timevar = "T",
          new.row.names = NULL,
          sep = "")
# misspecified model for initial Q, partial misspecification for g. 
# V set to 2 for Q and g to save time, not recommended at this sample size
   result.ex3 <- tmleMSM(Y = longd$Y, A = longd$A, W = longd[,c("W1", "W2")], V = longd$V, 
          T = longd$T, MSM = "A + V + I(V^2) + T", Qform = "Y ~ A + V", gform = "A ~ W2", 
	id = longd$id, V.Q=2, V.g=2)
   print(result.ex3)


# Example 4:  Introduce 20
# V set to 2 for Q and g to save time, not recommended at this sample size
  Delta <- rbinom(nrow(longd), 1, 0.8)
  result.ex4 <- tmleMSM(Y = longd$Y, A = longd$A, W = longd[,c("W1", "W2")], V = longd$V, T=longd$T,
          Delta = Delta, MSM = "A + V + I(V^2) + T", Qform = "Y ~ A + V", gform = "A ~ W2",
	  g.Deltaform = "Delta ~ 1", id=longd$id, verbose = TRUE, V.Q=2, V.g=2)
  print(result.ex4)

}

Run the code above in your browser using DataLab