Learn R Programming

ldr (version 1.3.3)

screen.pfc: Adaptive Screening of Predictors

Description

Given a set of $p$ predictors and a response, this function selects all predictors that are statistically related to the response at a specified significance level, using a flexible basis function.

Usage

screen.pfc(X, fy, cutoff=0.1)

Arguments

X
Matrix or data frame with n rows of observations and p columns of predictors of continuous type.
fy
Function of y. Basis function to be used to capture the dependency between individual predictors and the response. See bf for detail.
cutoff
The level of significance to be used for the cutoff, by default 0.1.

Value

  • Return a data frame object with $p$ rows corresponding to the variables with the following columns
  • FF statistic for testing the above hypotheses.
  • P-valueThe p-value of the test statistic. The F test has 1 and n-2 degrees of freedom
  • IndexIndex of the variable, as its position j.

Details

For each predictor $X_j$, write the equation $$X_j= \mu + \phi f_y + \epsilon$$ where $f_y$ is a flexible basis function provided by the user. The basis function is constructed using the function bf. The screening procedure uses a test statistic on the null hypothesis $\phi=0$ against the alternative $\phi \ne 0$. Given the $r$ components of the basis function $f_y$, the above model is a linear model where $X_j$ is the response and $f_y$ constitutes the predictors. The hypothesis test on $\phi$ is essentially an F-test. Specifically, given the data, let $\hat{\phi}$ be the ordinary least squares estimator of $\phi$. We consider the usual test statistic$$F_j=\frac{n-r-1}{r}.\frac{\sum_{i=1}^n [(X_{ji}-\bar{X}_{j.})^2 - (X_{ji}-\bar{X}_{j.} - \hat{\phi}_j \mathbf{f}_{y_i})^2]}{\sum_{i=1}^n (X_{ji}-\bar{X}_{j.} - \hat{\phi}_j \mathbf{f}_{y_i})^2}$$where $\bar{X}_{j.}=\sum_{i=1}^n X_{ji}/n$. The statistic $F_j$ follows an $F$ distribution with $(r, n-r-1)$ degrees of freedom. The sample size $n$ is expected to be larger than $r$.

References

Adragni, KP and Cook, RD (2008) Discussion on the Sure Independence Screening for Ultrahigh Dimensional Feature Space of Jianqing Fan and Jinchi Lv (2007) Journal of the Royal Statistical Society Series B, 70, Part5, pp1:35

Examples

Run this code
data(OH)
X <- OH[, -c(1,295)]; y=OH[,295]

# Correlation screening
out <- screen.pfc(X, fy=bf(y, case="poly", degree=1))
head(out)

# Special basis function
out1 <- screen.pfc(X, fy=scale(cbind(y, sqrt(y)), center=TRUE, scale=FALSE))
head(out1)

# Piecewise constant basis with 10 slices
out2 <- screen.pfc(X, fy=bf(y, case="pdisc", degree=0, nslices=10))
head(out2)

Run the code above in your browser using DataLab