Learn R Programming

Sieve (version 2.1)

sieve_solver: Calculate the coefficients for the basis functions

Description

This is the main function that performs sieve estimation. It calculate the coefficients by solving a penalized lasso type problem.

Usage

sieve_solver(
  model,
  Y,
  l1 = TRUE,
  family = "gaussian",
  lambda = NULL,
  nlambda = 100
)

Value

a list. In addition to the preprocessing information, it also has the fitted value.

Phi

a matrix. This is the design matrix directly used by the next step model fitting. The (i,j)-th element of this matrix is the evaluation of i-th sample's feature at the j-th basis function. The dimension of this matrix is sample size x basisN.

X

a matrix. This is the rescaled original feature/predictor matrix.

beta_hat

a matrix. Dimension is basisN x nlambda. The j-th column corresponds to the fitted regression coeffcients using the j-th hyperparameter in lambda.

type

a string. The type of basis funtion.

index_matrix

a matrix. It specifies what are the product basis functions used when constructing the design matrix Phi. It has a dimension basisN x dimension of original features. There are at most interaction_order many non-1 elements in each row.

basisN

a number. Number of sieve basis functions.

norm_para

a matrix. It records how each dimension of the feature/predictor is rescaled, which is useful when rescaling the testing sample's predictors.

lambda

a vector. It records the penalization hyperparameter used when solving the lasso problems. Default has a length of 100, meaning the algorithm tried 100 different penalization hyperparameters.

family

a string. 'gaussian', continuous numerical outcome, regression probelm; 'binomial', binary outcome, classification problem.

Arguments

model

a list. Typically, it is the output of Sieve::sieve_preprocess.

Y

a vector. The outcome variable. The length of Y equals to the training sample size, which should also match the row number of X in model.

l1

a logical variable. TRUE means calculating the coefficients by sovling a l1-penalized empirical risk minimization problem. FALSE means solving a least-square problem. Default is TRUE.

family

a string. 'gaussian', mean-squared-error regression problem.

lambda

same as the lambda of glmnet::glmnet.

nlambda

a number. Number of penalization hyperparameter used when solving the lasso-type problem. Default is 100.

Examples

Run this code
xdim <- 1 #1 dimensional feature
#generate 1000 training samples
TrainData <- GenSamples(s.size = 1000, xdim = xdim)
#use 50 cosine basis functions
type <- 'cosine'
basisN <- 50 
sieve.model <- sieve_preprocess(X = TrainData[,2:(xdim+1)], 
                                basisN = basisN, type = type)
sieve.fit<- sieve_solver(model = sieve.model, Y = TrainData$Y)

###if the outcome is binary, 
###need to solve a nonparametric logistic regression problem
xdim <- 1
TrainData <- GenSamples(s.size = 1e3, xdim = xdim, y.type = 'binary', frho = 'nonlinear_binary')
sieve.model <- sieve_preprocess(X = TrainData[,2:(xdim+1)], 
                                basisN = basisN, type = type)
sieve.fit<- sieve_solver(model = sieve.model, Y = TrainData$Y,
                         family = 'binomial')

Run the code above in your browser using DataLab