Learn R Programming

dma (version 1.4-0)

logistic.dma: Dynamic model averaging for binary outcomes

Description

Implements dynamic model averaging for continuous outcomes as described in McCormick et al. (2011, Biometrics). It can be either performed for all data at once (using logistic.dma), or dynamically for one observation at a time (combining the remaining functions, see Example). Along with the values described below, plot() creates a plot of the posterior model probabilities over time and model-averaged fitted values (with smooth curve overlay) and print() returns model matrix and posterior model probabilities. There are K candidate models, T time points, and d total covariates (including the intercept).

Usage

logistic.dma(x, y, models.which, lambda = 0.99, alpha = 0.99,autotune = TRUE, 
    initmodelprobs = NULL, initialsamp = NULL)
 
logdma.init(x, y, models.which)

logdma.predict(fit, newx)

logdma.update(fit, newx, newy, lambda = 0.99, autotune = TRUE)

logdma.average(fit, alpha = 0.99, initmodelprobs = NULL)

Arguments

x

T by (d-1) matrix of observed covariates. Note that a column of 1's is added automatically for the intercept. In logdma.init, this matrix contains only the training set.

y

T vector of binary responses. In logdma.init, these correspond to the training set only.

models.which

K by (d-1) matrix defining models. A 1 indicates a covariate is included in a particular model, a 0 if it is excluded. Model averaging is done over all modeld specified in models.which.

lambda

scalar forgetting factor with each model

alpha

scalar forgetting factor for model averaging

autotune

T/F indicates whether or not the automatic tuning procedure desribed in McCormick et al. should be applied. Default is true.

initmodelprobs

K vector of starting probabilities for model averaging. If null (default), then use 1/K for each model.

initialsamp

scalar indicating how many observations to use for generating initial values. If null (default), then use the first 10 percent of observations.

newx, newy

Subset of x and y corresponding to new observations.

fit

List with estimation results that are outputs of functions logdma.init, logdma.update and logdma.average.

Value

Functions logistic.dma and logdma.average return an object of class logistic.dma. Functions logdma.init and logdma.update return a list with estimation results which is a subset of the logistic.dma object. It has the following components:

x

T by (d-1) matrix of covariates

y

T by 1 vector of binary responses

models.which

K by (d-1) matrix of candidate models

lambda

scalar, tuning factor within models

alpha

scalar, tuning factor for model averaging

autotune

T/F, indicator of whether or not to use autotuning algorithm

alpha.used

T vector of alpha values used

theta

K by T by d array of dynamic logistic regression estimates for each model

vartheta

K by T by d array of dynamic logistic regression variances for each model

pmp

K by T array of posterior model probabilities

yhatdma

T vector of model-averaged predictions

yhatmodel

K by T vector of fitted values for each model

Function logdma.predict returns a matrix with predictions corresponding to the newx covariates.

Details

The function logistic.dma is composed of three parts, which can be also used separately: First, the model is trained with a subset of the data (function logdma.init), where the size of the training set is determined by initialsamp. Note that arguments x and y in logdma.init should contain the training subset only. Then, the estimation is updated with new observations (function logdma.update). Lastly, a dynamic model averaging is performed on the final estimates (function logdma.average). The updating, averaging and in addition predicting (logdma.predict) can be performed dynamically for one observation at a time, see Example below.

References

McCormick, T.M., Raftery, A.E., Madigan, D. and Burd, R.S. (2011) "Dynamic Logistic Regression and Dynamic Model Averaging for Binary Classification." Biometrics, 66:1162-1173.

Examples

Run this code
# NOT RUN {
# simulate some data to test
# first, static coefficients
coef <- c(.08,-.4,-.1)
coefmat <- cbind(rep(coef[1],200),rep(coef[2],200),rep(coef[3],200))
# then, dynamic ones
coefmat <- cbind(coefmat,seq(1,.45,length.out=nrow(coefmat)),
            seq(-.75,-.15,length.out=nrow(coefmat)),
            c(rep(-1.5,nrow(coefmat)/2),rep(-.5,nrow(coefmat)/2)))
npar <- ncol(coefmat)-1

# simulate data
set.seed(1234)
dat <- matrix(rnorm(200*(npar),0,1),200,(npar))
ydat <- exp(rowSums((cbind(rep(1,nrow(dat)),dat))[1:100,]*coefmat[1:100,]))/
          (1+exp(rowSums(cbind(rep(1,nrow(dat)),dat)[1:100,]*coefmat[1:100,])))
y <- c(ydat,exp(rowSums(cbind(rep(1,nrow(dat)),dat)[-c(1:100),c(1,5,6)]*
               coefmat[-c(1:100),c(1,5,6)]))/
          (1+exp(rowSums(cbind(rep(1,nrow(dat)),dat)[-c(1:100),c(1,5,6)]*
               coefmat[-c(1:100),c(1,5,6)]))))
u <- runif (length(y))
y <- as.numeric (u < y)

# Consider three candidate models
mmat <- matrix(c(1,1,1,1,1,0,0,0,1,1,1,0,1,0,1),3,5, byrow = TRUE)

# Fit model and plot
# autotuning is turned off for this demonstration example
ldma.test <- logistic.dma(dat, y, mmat, lambda = .99, alpha = .99, 
    autotune = FALSE, initialsamp = 20)
plot(ldma.test)

# Using DMA in a "streaming" mode
modl <- logdma.init(dat[1:20,], y[1:20], mmat)
yhat <- matrix(0, ncol=3, nrow=200)
for(i in 21:200){
  # if prediction is desired, use logdma.predict
  yhat[i,] <- logdma.predict(modl, dat[i,])
  # update
  modl <- logdma.update(modl, dat[i,], y[i], 
                lambda = .99, autotune = FALSE)
}
# the averaging step could be also done within the loop above
ldma.stream <- logdma.average(modl, alpha = .99)
plot(ldma.stream)
# }

Run the code above in your browser using DataLab