Fit smoothing spline ANOVA models with correlated Gaussian data.
The symbolic model specification via formula
follows the same
rules as in lm
.
ssanova9(formula, type=NULL, data=list(), subset, offset,
na.action=na.omit, partial=NULL, method="v", alpha=1.4,
varht=1, id.basis=NULL, nbasis=NULL, seed=NULL, cov,
skip.iter=FALSE)para.arma(fit)
ssanova9
returns a list object of class
c("ssanova9","ssanova")
.
The method summary.ssanova9
can be used to obtain
summaries of the fits. The method predict.ssanova
can
be used to evaluate the fits at arbitrary points along with standard
errors. The method project.ssanova9
can be used to
calculate the Kullback-Leibler projection for model selection. The
methods residuals.ssanova
and
fitted.ssanova
extract the respective traits from the
fits.
para.arma
returns the fitted ARMA coefficients for
cov=list("arma",c(p,q))
in the call to ssanova9
.
Symbolic description of the model to be fit.
List specifying the type of spline for each variable.
See mkterm
for details.
Optional data frame containing the variables in the model.
Optional vector specifying a subset of observations to be used in the fitting process.
Optional offset term with known parameter 1.
Function which indicates what should happen when the data contain NAs.
Optional symbolic description of parametric terms in partial spline models.
Method for smoothing parameter selection. Supported
are method="v"
for V, method="m"
for M, and
method="u"
for U; see the reference for definitions of U,
V, and M.
Parameter modifying V or U; larger absolute values
yield smoother fits. Ignored when method="m"
are
specified.
External variance estimate needed for
method="u"
. Ignored when method="v"
or
method="m"
are specified.
Index designating selected "knots".
Number of "knots" to be selected. Ignored when
id.basis
is supplied.
Seed to be used for the random generation of "knots".
Ignored when id.basis
is supplied.
Input for covariance functions. See mkcov
for details.
Flag indicating whether to use initial values of theta and skip theta iteration. See notes on skipping theta iteration.
ssanova9
fit with ARMA error.
For the selection of multiple smoothing parameters,
nlm
is used to minimize the selection criterion such
as the GCV score. When the number of smoothing parameters is large,
the process can be time-consuming due to the great amount of
function evaluations involved.
The starting values for the nlm
iteration are obtained using
Algorith 3.2 in Gu and Wahba (1991). These starting values usually
yield good estimates themselves, leaving the subsequent quasi-Newton
iteration to pick up the "last 10%" performance with extra effort
many times of the initial one. Thus, it is often a good idea to
skip the iteration by specifying skip.iter=TRUE
, especially
in high-dimensions and/or with multi-way interactions.
skip.iter=TRUE
could be made the default in future releases.
The model specification via formula
is intuitive. For
example, y~x1*x2
yields a model of the form
$$
y = C + f_{1}(x1) + f_{2}(x2) + f_{12}(x1,x2) + e
$$
with the terms denoted by "1"
, "x1"
, "x2"
, and
"x1:x2"
.
The model terms are sums of unpenalized and penalized terms. Attached to every penalized term there is a smoothing parameter, and the model complexity is largely determined by the number of smoothing parameters.
A subset of the observations are selected as "knots." Unless
specified via id.basis
or nbasis
, the number of
"knots" \(q\) is determined by \(max(30,10n^{2/9})\), which is
appropriate for the default cubic splines for numerical vectors.
Using \(q\) "knots," ssanova
calculates an approximate
solution to the penalized least squares problem using algorithms of
the order \(O(nq^{2})\), which for \(q<<n\) scale better than
the \(O(n^{3})\) algorithms of ssanova0
. For the
exact solution, one may set \(q=n\) in ssanova
, but
ssanova0
would be much faster.
Han, C. and Gu, C. (2008), Optimal smoothing with correlated data, Sankhya, 70-A, 38--72.
Gu, C. (2013), Smoothing Spline ANOVA Models (2nd Ed). New York: Springer-Verlag.
Gu, C. (2014), Smoothing Spline ANOVA Models: R Package gss. Journal of Statistical Software, 58(5), 1-25. URL http://www.jstatsoft.org/v58/i05/.
x <- runif(100); y <- 5 + 3*sin(2*pi*x) + rnorm(x)
## independent fit
fit <- ssanova9(y~x,cov=list("known",diag(1,100)))
## AR(1) fit
fit <- ssanova9(y~x,cov=list("arma",c(1,0)))
para.arma(fit)
## MA(1) fit
e <- rnorm(101); e <- e[-1]-.5*e[-101]
x <- runif(100); y <- 5 + 3*sin(2*pi*x) + e
fit <- ssanova9(y~x,cov=list("arma",c(0,1)))
para.arma(fit)
## Clean up
if (FALSE) rm(x,y,e,fit)
Run the code above in your browser using DataLab