Learn R Programming

bsts (version 0.9.5)

bsts: Bayesian Structural Time Series

Description

Uses MCMC to sample from the posterior distribution of a Bayesian structural time series model. This function can be used either with or without contemporaneous predictor variables (in a time series regression).

If predictor variables are present, the regression coefficients are fixed (as opposed to time varying, though time varying coefficients might be added as state component). The predictors and response in the formula are contemporaneous, so if you want lags and differences you need to put them in the predictor matrix yourself.

If no predictor variables are used, then the model is an ordinary state space time series model.

The model allows for several useful extensions beyond standard Bayesian dynamic linear models.

  • A spike-and-slab prior is used for the (static) regression component of models that include predictor variables. This is especially useful with large numbers of regressor series.

  • Both the spike-and-slab component (for static regressors) and the Kalman filter (for components of time series state) require observations and state variables to be Gaussian. The bsts package allows for non-Gaussian error families in the observation equation (as well as some state components) by using data augmentation to express these families as conditionally Gaussian.

  • As of version 0.7.0, bsts supports having multiple observations at the same time point. In this case the basic model is taken to be

    $$y_{t,j} = Z_t^T \alpha_t + \beta^Tx_{t, j} + \epsilon_{t,j}.$$

    This is a regression model where all observations with the same time point share a common time series effect.

Usage

bsts(formula,
     state.specification,
     family = c("gaussian", "logit", "poisson", "student"),
     data,
     prior,
     contrasts = NULL,
     na.action = na.pass,
     niter,
     ping = niter / 10,
     model.options = BstsOptions(),
     timestamps = NULL,
     seed = NULL,
     ...)

Arguments

formula

A formula describing the regression portion of the relationship between y and X.

If no regressors are desired then the formula can be replaced by a numeric vector giving the time series to be modeled. Missing values are not allowed in predictors, but they are allowed in the response variable.

If the response variable is of class zoo, xts, or ts, then the time series information it contains will be used in many of the plotting methods called from plot.bsts.

state.specification

A list with elements created by AddLocalLinearTrend, AddSeasonal, and similar functions for adding components of state. See the help page for state.specification.

family

The model family for the observation equation. Non-Gaussian model families use data augmentation to recover a conditionally Gaussian model.

data

An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which bsts is called.

prior

If regressors are supplied in the model formula, then this is a prior distribution for the regression component of the model, as created by SpikeSlabPrior. The prior for the time series component of the model will be specified during the creation of state.specification. This argument is only used if a formula is specified.

If the model contains no regressors, then this is simply the prior on the residual standard deviation, expressed as an object created by SdPrior.

contrasts

An optional list containing the names of contrast functions to use when converting factors numeric variables in a regression formula. This argument works exactly as it does in lm. The names of the list elements correspond to factor variables in your model formula. The list elements themselves are the names of contrast functions (see help(contr.treatment) and the contrasts.arg argument to model.matrix.default). This argument is only used if a model formula is specified, and even then the default is probably what you want.

na.action

What to do about missing values. The default is to allow missing responses, but no missing predictors. Set this to na.omit or na.exclude if you want to omit missing responses altogether.

niter

A positive integer giving the desired number of MCMC draws.

ping

A scalar giving the desired frequency of status messages. If ping > 0 then the program will print a status message to the screen every ping MCMC iterations.

model.options

An object (list) returned by BstsOptions. See that function for details.

timestamps

The timestamp associated with each value of the response. This argument is primarily useful in cases where the response has missing gaps, or where there are multiple observations per time point. If the response is a "regular" time series with a single observation per time point then you can leave this argument as NULL. In that case, if either the response or the data argument is a type convertible to zoo then timestamps will be inferred.

seed

An integer to use as the random seed for the underlying C++ code. If NULL then the seed will be set using the clock.

...

Extra arguments to be passed to SpikeSlabPrior (see the entry for the prior argument, above).

Value

An object of class bsts which is a list with the following components

coefficients

A niter by ncol(X) matrix of MCMC draws of the regression coefficients, where X is the design matrix implied by formula. This is only present if a model formula was supplied.

sigma.obs

A vector of length niter containing MCMC draws of the residual standard deviation.

The returned object will also contain named elements holding the MCMC draws of model parameters belonging to the state models. The names of each component are supplied by the entries in state.specification. If a model parameter is a scalar, then the list element is a vector with niter elements. If the parameter is a vector then the list element is a matrix with niter rows. If the parameter is a matrix then the list element is a 3-way array with first dimension niter.

Finally, if a model formula was supplied, then the returned object will contain the information necessary for the predict method to build the design matrix when a new prediction is made.

Details

If the model family is logit, then there are two ways one can format the response variable. If the response is 0/1, TRUE/FALSE, or 1/-1, then the response variable can be passed as with any other model family. If the response is a set of counts out of a specified number of trials then it can be passed as a two-column matrix, where the first column contains the counts of successes and the second contains the count of failures.

Likewise, if the model family is Poisson, the response can be passed as a single vector of counts, under the assumption that each observation has unit exposure. If the exposures differ across observations, then the resopnse can be a two column matrix, with the first column containing the event counts and the second containing exposure times.

References

Harvey (1990), "Forecasting, structural time series, and the Kalman filter", Cambridge University Press.

Durbin and Koopman (2001), "Time series analysis by state space methods", Oxford University Press.

Goerge and McCulloch (1997) "Approaches for Bayesian variable selection", Statistica Sinica pp 339--374.

Ghosh and Clyde (2011) "Rao-Blackwellization for Bayesian variable selection and model averaging in linear and binary regression: a novel data augmentation approach", JASA pp 1041 --1052.

See Also

bsts, AddLocalLevel, AddLocalLinearTrend, AddSemilocalLinearTrend, AddSeasonal AddDynamicRegression SpikeSlabPrior, SdPrior.

Examples

Run this code
# NOT RUN {
  ## Example 1:  Time series (ts) data
  data(AirPassengers)
  y <- log(AirPassengers)
  ss <- AddLocalLinearTrend(list(), y)
  ss <- AddSeasonal(ss, y, nseasons = 12)
  model <- bsts(y, state.specification = ss, niter = 500)
  pred <- predict(model, horizon = 12, burn = 100)
  par(mfrow = c(1,2))
  plot(model)
  plot(pred)

# }
# NOT RUN {
  MakePlots <- function(model, ask = TRUE) {
    ## Make all the plots callable by plot.bsts.
    opar <- par(ask = ask)
    on.exit(par(opar))
    plot.types <- c("state", "components", "residuals",
                    "prediction.errors", "forecast.distribution")
    for (plot.type in plot.types) {
      plot(model, plot.type)
    }
    if (model$has.regression) {
      regression.plot.types <- c("coefficients", "predictors", "size")
      for (plot.type in regression.plot.types) {
        plot(model, plot.type)
      }
    }
  }

  ## Example 2: GOOG is the Google stock price, an xts series of daily
  ##            data.
  data(goog)
  ss <- AddSemilocalLinearTrend(list(), goog)
  model <- bsts(goog, state.specification = ss, niter = 500)
  MakePlots(model)

  ## Example 3:  Change GOOG to be zoo, and not xts.
  goog <- zoo(goog, index(goog))
  ss <- AddSemilocalLinearTrend(list(), goog)
  model <- bsts(goog, state.specification = ss, niter = 500)
  MakePlots(model)

  ## Example 4:  Naked numeric data works too
  y <- rnorm(100)
  ss <- AddLocalLinearTrend(list(), y)
  model <- bsts(y, state.specification = ss, niter = 500)
  MakePlots(model)

  ## Example 5:  zoo data with intra-day measurements
  y <- zoo(rnorm(100),
           seq(from = as.POSIXct("2012-01-01 7:00 EST"), len = 100, by = 100))
  ss <- AddLocalLinearTrend(list(), y)
  model <- bsts(y, state.specification = ss, niter = 500)
  MakePlots(model)

\dontrun{
  ## Example 6:  Including regressors
  data(iclaims)
  ss <- AddLocalLinearTrend(list(), initial.claims$iclaimsNSA)
  ss <- AddSeasonal(ss, initial.claims$iclaimsNSA, nseasons = 52)
  model <- bsts(iclaimsNSA ~ ., state.specification = ss, data =
                initial.claims, niter = 1000)
  plot(model)
  plot(model, "components")
  plot(model, "coefficients")
  plot(model, "predictors")
}
# }
# NOT RUN {
# }
# NOT RUN {
  ## Example 7:  Regressors with multiple time stamps.
  number.of.time.points <- 50
  sample.size.per.time.point <- 10
  total.sample.size <- number.of.time.points * sample.size.per.time.point
  sigma.level <- .1
  sigma.obs <- 1

  ## Simulate some fake data with a local level state component.
  trend <- cumsum(rnorm(number.of.time.points, 0, sigma.level))
  predictors <- matrix(rnorm(total.sample.size * 2), ncol = 2)
  colnames(predictors) <- c("X1", "X2")
  coefficients <- c(-10, 10)
  regression <- as.numeric(predictors %*% coefficients)
  y.hat <- rep(trend, sample.size.per.time.point) + regression
  y <- rnorm(length(y.hat), y.hat, sigma.obs)

  ## Create some time stamps, with multiple observations per time stamp.
  first <- as.POSIXct("2013-03-24")
  dates <- seq(from = first, length = number.of.time.points, by = "month")
  timestamps <- rep(dates, sample.size.per.time.point)

  ## Run the model with a local level trend, and an unnecessary seasonal component.
  ss <- AddLocalLevel(list(), y)
  ss <- AddSeasonal(ss, y, nseasons = 7)
  model <- bsts(y ~ predictors, ss, niter = 250, timestamps = timestamps,
                seed = 8675309)
  plot(model)
  plot(model, "components")
# }
# NOT RUN {
## Example 8: Non-Gaussian data
## Poisson counts of shark attacks in Florida.
data(shark)
logshark <- log1p(shark$Attacks)
ss.level <- AddLocalLevel(list(), y = logshark)
model <- bsts(shark$Attacks, ss.level, niter = 500,
              ping = 250, family = "poisson", seed = 8675309)

## Poisson data can have an 'exposure' as the second column of a
## two-column matrix.
model <- bsts(cbind(shark$Attacks, shark$Population / 1000),
              state.specification = ss.level, niter = 500, 
              family = "poisson", ping = 250, seed = 8675309)

# }

Run the code above in your browser using DataLab