Learn R Programming

gss (version 2.2-8)

ssanova9: Fitting Smoothing Spline ANOVA Models with Correlated Data

Description

Fit smoothing spline ANOVA models with correlated Gaussian data. The symbolic model specification via formula follows the same rules as in lm.

Usage

ssanova9(formula, type=NULL, data=list(), subset, offset,
         na.action=na.omit, partial=NULL, method="v", alpha=1.4,
         varht=1, id.basis=NULL, nbasis=NULL, seed=NULL, cov,
         skip.iter=FALSE)

para.arma(fit)

Value

ssanova9 returns a list object of class

c("ssanova9","ssanova").

The method summary.ssanova9 can be used to obtain summaries of the fits. The method predict.ssanova can be used to evaluate the fits at arbitrary points along with standard errors. The method project.ssanova9 can be used to calculate the Kullback-Leibler projection for model selection. The methods residuals.ssanova and

fitted.ssanova extract the respective traits from the fits.

para.arma returns the fitted ARMA coefficients for

cov=list("arma",c(p,q)) in the call to ssanova9.

Arguments

formula

Symbolic description of the model to be fit.

type

List specifying the type of spline for each variable. See mkterm for details.

data

Optional data frame containing the variables in the model.

subset

Optional vector specifying a subset of observations to be used in the fitting process.

offset

Optional offset term with known parameter 1.

na.action

Function which indicates what should happen when the data contain NAs.

partial

Optional symbolic description of parametric terms in partial spline models.

method

Method for smoothing parameter selection. Supported are method="v" for V, method="m" for M, and method="u" for U; see the reference for definitions of U, V, and M.

alpha

Parameter modifying V or U; larger absolute values yield smoother fits. Ignored when method="m" are specified.

varht

External variance estimate needed for method="u". Ignored when method="v" or method="m" are specified.

id.basis

Index designating selected "knots".

nbasis

Number of "knots" to be selected. Ignored when id.basis is supplied.

seed

Seed to be used for the random generation of "knots". Ignored when id.basis is supplied.

cov

Input for covariance functions. See mkcov for details.

skip.iter

Flag indicating whether to use initial values of theta and skip theta iteration. See notes on skipping theta iteration.

fit

ssanova9 fit with ARMA error.

Skipping Theta Iteration

For the selection of multiple smoothing parameters, nlm is used to minimize the selection criterion such as the GCV score. When the number of smoothing parameters is large, the process can be time-consuming due to the great amount of function evaluations involved.

The starting values for the nlm iteration are obtained using Algorith 3.2 in Gu and Wahba (1991). These starting values usually yield good estimates themselves, leaving the subsequent quasi-Newton iteration to pick up the "last 10%" performance with extra effort many times of the initial one. Thus, it is often a good idea to skip the iteration by specifying skip.iter=TRUE, especially in high-dimensions and/or with multi-way interactions.

skip.iter=TRUE could be made the default in future releases.

Details

The model specification via formula is intuitive. For example, y~x1*x2 yields a model of the form $$ y = C + f_{1}(x1) + f_{2}(x2) + f_{12}(x1,x2) + e $$ with the terms denoted by "1", "x1", "x2", and "x1:x2".

The model terms are sums of unpenalized and penalized terms. Attached to every penalized term there is a smoothing parameter, and the model complexity is largely determined by the number of smoothing parameters.

A subset of the observations are selected as "knots." Unless specified via id.basis or nbasis, the number of "knots" \(q\) is determined by \(max(30,10n^{2/9})\), which is appropriate for the default cubic splines for numerical vectors.

Using \(q\) "knots," ssanova calculates an approximate solution to the penalized least squares problem using algorithms of the order \(O(nq^{2})\), which for \(q<<n\) scale better than the \(O(n^{3})\) algorithms of ssanova0. For the exact solution, one may set \(q=n\) in ssanova, but ssanova0 would be much faster.

References

Han, C. and Gu, C. (2008), Optimal smoothing with correlated data, Sankhya, 70-A, 38--72.

Gu, C. (2013), Smoothing Spline ANOVA Models (2nd Ed). New York: Springer-Verlag.

Gu, C. (2014), Smoothing Spline ANOVA Models: R Package gss. Journal of Statistical Software, 58(5), 1-25. URL http://www.jstatsoft.org/v58/i05/.

Examples

Run this code
x <- runif(100); y <- 5 + 3*sin(2*pi*x) + rnorm(x)
## independent fit
fit <- ssanova9(y~x,cov=list("known",diag(1,100)))
## AR(1) fit
fit <- ssanova9(y~x,cov=list("arma",c(1,0)))
para.arma(fit)
## MA(1) fit
e <- rnorm(101); e <- e[-1]-.5*e[-101]
x <- runif(100); y <- 5 + 3*sin(2*pi*x) + e
fit <- ssanova9(y~x,cov=list("arma",c(0,1)))
para.arma(fit)
## Clean up
if (FALSE) rm(x,y,e,fit)

Run the code above in your browser using DataLab