pfc: Principal fitted components

Description

Principal fitted components model for sufficient dimension reduction. This function estimates all parameters in the model.

Usage

pfc(X, y, fy = NULL, numdir = NULL, structure = c("iso", "aniso",
    "unstr", "unstr2"), eps_aniso = 1e-3, numdir.test = FALSE, ...)

Arguments

Design matrix with n rows of observations and p columns of predictors. The predictors are assumed to have a continuous distribution.

The response vector of n observations, continuous or categorical.

Basis function to be obtained using bf or defined by the user. It is a function of y alone and has r independent column vectors. See bf, for detail.

numdir

The number of directions to be used in estimating the reduction subspace. The dimension must be less than or equal to the minimum of r and p. By default numdir=$\min{r,p}$.

structure

Structure of var(X|Y). The following options are available: "iso" for isotropic (predictors, conditionally on the response, are independent and on the same measurement scale); "aniso" for anisotropic (predictors, con

eps_aniso

Precision term used in estimating var(X|Y) for the anisotropic structure.

numdir.test

Boolean. If FALSE, pfc fits with the numdir provided only. If TRUE, PFC models are fit for all dimensions less than or equal to numdir.

...

Additional arguments to Grassmannoptim.

Value

This command returns a list object of class ldr. The output depends on the argument numdir.test. If numdir.test=TRUE, a list of matrices is provided corresponding to the numdir values (1 through numdir) for each of the parameters $\mu$, $\beta$, $\Gamma$, $\Gamma_0$, $\Omega$, and $\Omega_0$. Otherwise, a single list of matrices for a single value of numdir. The outputs of loglik, aic, bic, numpar are vectors of numdir elements if numdir.test=TRUE, and scalars otherwise. Following are the components returned:
RThe reduction data-matrix of $X$ obtained using the centered data-matrix $X$. The centering of the data-matrix of $X$ is such that each column vector is centered around its sample mean.
MuhatEstimate of $\mu$.
BetahatEstimate of $\beta$.
DeltahatThe estimate of the covariance $\Delta$.
GammahatAn estimated orthogonal basis representative of $\hat{\mathcal{S}}_{\Gamma}$, the subspace spanned by $\Gamma$.
Gammahat0An estimated orthogonal basis representative of $\hat{\mathcal{S}}_{\Gamma_0}$, the subspace spanned by $\Gamma_0$.
OmegahatThe estimate of the covariance $\Omega$ if an extended model is used.
Omegahat0The estimate of the covariance $\Omega_0$ if an extended model is used.
loglikThe value of the log-likelihood for the model.
aicAkaike information criterion value.
bicBayesian information criterion value.
numdirThe number of directions to estimate.
numparThe number of parameters in the model.
evaluesThe first numdir largest eigenvalues of $\hat{\Sigma}_{\mathrm{fit}}$.

Details

Let $X$ be a column vector of $p$ predictors, and $Y$ be a univariate response variable. Principal fitted components model is an inverse regression model for sufficient dimension reduction. It is an inverse regression model given by $X|(Y=y) \sim N(\mu + \Gamma \beta f_y, \Delta)$. The term $\Delta$ is assumed independent of $y$. Its simplest structure is the isotropic (iso) with $\Delta=\delta^2 I_p$, where, conditionally on the response, the predictors are independent and are on the same measurement scale. The sufficient reduction is $\Gamma^TX$. The anisotropic (aniso) PFC model assumes that $\Delta=$diag$(\delta_1^2, ..., \delta_p^2)$, where the conditional predictors are independent and on different measurement scales. The unstructured (unstr) PFC model allows a general structure for $\Delta$. With the anisotropic and unstructured $\Delta$, the sufficient reduction is $\Gamma^T \Delta^{-1}X$. it should be noted that $X \in R^{p}$ while the data-matrix to use is in $R^{n \times p}$. The error structure of the extended structure has the following form $$\Delta=\Gamma \Omega \Gamma^T + \Gamma_0 \Omega_0 \Gamma_0^T,$$where $\Gamma_0$ is the orthogonal completion of $\Gamma$ such that $(\Gamma, \Gamma_0)$ is a $p \times p$ orthogonal matrix. The matrices $\Omega \in R^{d \times d}$ and $\Omega_0 \in R^{(p-d) \times (p-d)}$ are assumed to be symmetric and full-rank. The sufficient reduction is $\Gamma^{T}X$. Let $\mathcal{S}_{\Gamma}$ be the subspace spanned by the columns of $\Gamma$. The parameter space of $\mathcal{S}_{\Gamma}$ is the set of all $d$ dimensional subspaces in $R^p$, called Grassmann manifold and denoted by $\mathcal{G}_{(d,p)}$. Let $\hat{\Sigma}$, $\hat{\Sigma}_{\mathrm{fit}}$ be the sample variance of $X$ and the fitted covariance matrix, and let $\hat{\Sigma}_{\mathrm{res}}=\hat{\Sigma} - \hat{\Sigma}_{\mathrm{fit}}$. The MLE of $\mathcal{S}_{\Gamma}$ under unstr2 setup is obtained by maximizing the log-likelihood $$L(\mathcal{S}_U) = - \log|U^T \hat{\Sigma}_{\mathrm{res}} U| - \log|V^T \hat{\Sigma}V|$$ over $\mathcal{G}_{(d,p)}$, where $V$ is an orthogonal completion of $U$. The dimension $d$ of the sufficient reduction must be estimated. A sequential likelihood ratio test is implemented as well as Akaike and Bayesian information criterion following Cook and Forzani (2008)

References

Adragni, KP and Cook, RD (2009): Sufficient dimension reduction and prediction in regression. Phil. Trans. R. Soc. A 367, 4385-4405. Cook, RD (2007): Fisher Lecture - Dimension Reduction in Regression (with discussion). Statistical Science, 22, 1--26. Cook, RD and Forzani, L (2008): Principal fitted components for dimension reduction in regression. Statistical Science 23, 485--501.

Examples

Run this code

data(bigmac)
fit1 <- pfc(X=bigmac[,-1], y=bigmac[,1], fy=bf(y=bigmac[,1], case="poly",
        degree=3),numdir=3, structure="aniso")
summary(fit1)
plot(fit1)

fit2 <- pfc(X=bigmac[,-1], y=bigmac[,1], fy=bf(y=bigmac[,1], case="poly",
        degree=3), numdir=3, structure="aniso", numdir.test=TRUE)
summary(fit2)

Run the code above in your browser using DataLab