Learn R Programming

gss (version 2.2-8)

sscden: Estimating Conditional Probability Density Using Smoothing Splines

Description

Estimate conditional probability densities using smoothing spline ANOVA models. The symbolic model specification via formula follows the same rules as in lm.

Usage

sscden(formula, response, type=NULL, data=list(), weights, subset,
       na.action=na.omit, alpha=1.4, id.basis=NULL, nbasis=NULL,
       seed=NULL, ydomain=as.list(NULL), yquad=NULL, prec=1e-7,
       maxiter=30, skip.iter=FALSE)

sscden1(formula, response, type=NULL, data=list(), weights, subset, na.action=na.omit, alpha=1.4, id.basis=NULL, nbasis=NULL, seed=NULL, rho=list("xy"), ydomain=as.list(NULL), yquad=NULL, prec=1e-7, maxiter=30, skip.iter=FALSE)

Value

sscden returns a list object of class "sscden".

sscden1 returns a list object of class

c("sscden1","sscden").

dsscden and cdsscden can be used to evaluate the estimated conditional density \(f(y|x)\) and

\(f(y1|x,y2)\); psscden, qsscden,

cpsscden, and cqsscden can be used to evaluate conditional cdf and quantiles. The methods

project.sscden or project.sscden1 can be used to calculate the Kullback-Leibler or square-error projections for model selection.

Arguments

formula

Symbolic description of the model to be fit.

response

Formula listing response variables.

type

List specifying the type of spline for each variable. See mkterm for details.

data

Optional data frame containing the variables in the model.

weights

Optional vector of counts for duplicated data.

subset

Optional vector specifying a subset of observations to be used in the fitting process.

na.action

Function which indicates what should happen when the data contain NAs.

alpha

Parameter defining cross-validation scores for smoothing parameter selection.

id.basis

Index of observations to be used as "knots."

nbasis

Number of "knots" to be used. Ignored when id.basis is specified.

seed

Seed to be used for the random generation of "knots." Ignored when id.basis is specified.

ydomain

Data frame specifying marginal support of conditional density.

yquad

Quadrature for calculating integral on Y domain. Mandatory if response variables other than factors or numerical vectors are involved.

prec

Precision requirement for internal iterations.

maxiter

Maximum number of iterations allowed for internal iterations.

skip.iter

Flag indicating whether to use initial values of theta and skip theta iteration. See ssanova for notes on skipping theta iteration.

rho

rho function needed for sscden1.

Details

The model is specified via formula and response, where response lists the response variables. For example, sscden(~y*x,~y) prescribe a model of the form $$ log f(y|x) = g_{y}(y) + g_{xy}(x,y) + C(x) $$ with the terms denoted by "y", "y:x"; the term(s) not involving response(s) are removed and the constant C(x) is determined by the fact that a conditional density integrates to one on the y axis. sscden1 does keep terms not involving response(s) during estimation, although those terms cancel out when one evaluates the estimated conditional density.

The model terms are sums of unpenalized and penalized terms. Attached to every penalized term there is a smoothing parameter, and the model complexity is largely determined by the number of smoothing parameters.

A subset of the observations are selected as "knots." Unless specified via id.basis or nbasis, the number of "knots" \(q\) is determined by \(max(30,10n^{2/9})\), which is appropriate for the default cubic splines for numerical vectors.

References

Gu, C. (1995), Smoothing spline density estimation: Conditional distribution. Statistica Sinica, 5, 709--726. Springer-Verlag.

Gu, C. (2014), Smoothing Spline ANOVA Models: R Package gss. Journal of Statistical Software, 58(5), 1-25. URL http://www.jstatsoft.org/v58/i05/.

Examples

Run this code
data(penny); set.seed(5732)
fit <- sscden1(~year*mil,~mil,data=penny,
              ydomain=data.frame(mil=c(49,61)))
yy <- 1944+(0:92)/2
quan <- qsscden(fit,c(.05,.25,.5,.75,.95),
                data.frame(year=yy))
plot(penny$year+.1*rnorm(90),penny$mil,ylim=c(49,61))
for (i in 1:5) lines(yy,quan[i,])
## Clean up
if (FALSE) rm(penny,yy,quan)

Run the code above in your browser using DataLab