classif.gsam.vs: Variable Selection in Functional Data Classification

Description

Computes classification by selecting the functional (and non functional) explanatory variables.

Usage

classif.gsam.vs(
  data = list(),
  y,
  x,
  family = binomial(),
  weights = "equal",
  basis.x = NULL,
  basis.b = NULL,
  type = "1vsall",
  prob = 0.5,
  alpha = 0.05,
  dcor.min = 0.01,
  smooth = TRUE,
  measure = "accuracy",
  xydist,
  ...
)

Value

Return the final fitted model (same result of the classsification method) plus:

dcor, matrix with the values of distance correlation for each pontential covariate (by column) and the residual of the model in each step (by row).
i.predictor, vector with 1 if the variable is selected, 0 otherwise.
ipredictor, vector with the name of selected variables (in order of selection)

Arguments

data

List that containing the variables in the model. "df" element is a data.frame with the response and scalar covariates (numeric and factors variables are allowed). Functional covariates of class fdata or fd are introduced in the following items in the data list.

y

caracter string with the name of the scalar response variable

x

caracter string vector with the name of the scalar and functional potential covariates.

family

a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See family for details of family functions.)

weights

Weights:

if character string ='equal' same weights for each observation (by default) and ='inverse' for inverse-probability of weighting.
if numeric vector of length n, Weight values of each observation.

basis.x

List of basis for functional explanatory data estimation.

basis.b

List of basis for functional beta parameter estimation.

type

character, type of scheme classification. '1vsall' (by default) strategy involves training a single classifier per class, with the samples of that class as positive samples and all other samples as negatives. Other posibility for K-way multiclass problem is the 'majority' voting scheme (also called one vs one). The procedure trains the \(K (K - 1) / 2\) binary classifiers and predicts the final class label as the class label that has been predicted most frequently.

prob

probability value used for binary discriminant.

alpha

alpha value to test the null hypothesis for the test of independence among covariate X and residual e. By default is 0.05.

dcor.min

lower threshold for the variable X to be considered. X is discarded if the distance correlation \(R(X,e)< dcor.min\) (e is the residual).

smooth

if TRUE, a smooth estimate is made for all covariates included in the model (less for factors). The model is adjusted with the estimated variable linearly or smoothly. If the models are equivalent, the model is adjusted with the linearly estimated variable.

measure

measure related with correct classification (by default accuracy).

xydist

list with the matrices of distances of each variable (all potential covariates and the response) with itself.

...

Further arguments passed to or from other methods.

Author

Febrero-Bande, M. and Oviedo de la Fuente, M.

References

Febrero-Bande, M., Gonz\'alez-Manteiga, W. and Oviedo de la Fuente, M. Variable selection in functional additive regression models, (2018). Computational Statistics, 1-19. DOI: tools:::Rd_expr_doi("10.1007/s00180-018-0844-5")

Examples

Run this code

if (FALSE) {
data(tecator)
x <- tecator$absorp.fdata
x1 <- fdata.deriv(x)
x2 <- fdata.deriv(x,nderiv=2)
y <- factor(ifelse(tecator$y$Fat<12,0,1))
xcat0 <- cut(rnorm(length(y)),4)
xcat1 <- cut(tecator$y$Protein,4)
xcat2 <- cut(tecator$y$Water,4)
ind <- 1:129
dat    <- data.frame("Fat"=y, x1$data, xcat1, xcat2)
ldat <- ldata("df"=dat[ind,],"x"=x[ind,],"x1"=x1[ind,],"x2"=x2[ind,])
# 3 functionals (x,x1,x2), 3 factors (xcat0, xcat1, xcat2)
# and 100 scalars (impact poitns of x1) 

res.gam <- classif.gsam(Fat~s(x),data=ldat)
summary(res.gam)

# Time consuming
res.gam.vs <- classif.gsam.vs("Fat",data=ldat)
summary(res.gam.vs)
res.gam.vs$i.predictor
res.gam.vs$ipredictor

# Prediction 
newldat <- ldata("df"=dat[-ind,],"x"=x[-ind,],
                "x1"=x1[-ind,],"x2"=x2[-ind,])
pred.gam <- predict(res.gam,newldat)                
pred.gam.vs <- predict(res.gam.vs,newldat)
cat2meas(newldat$df$Fat, pred.gam)
cat2meas(newldat$df$Fat, pred.gam.vs)
}

Run the code above in your browser using DataLab