samEL: Training function of Sparse Additive Possion Regression

Description

The log-linear model is learned using training data.

Usage

samEL(
  X,
  y,
  p = 3,
  lambda = NULL,
  nlambda = NULL,
  lambda.min.ratio = 0.25,
  thol = 1e-05,
  max.ite = 1e+05,
  regfunc = "L1"
)

Arguments

The n by d design matrix of the training set, where n is sample size and d is dimension.

The n-dimensional response vector of the training set, where n is sample size. Responses must be non-negative integers.

The number of basis spline functions. The default value is 3.

lambda

A user supplied lambda sequence. Typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.min.ratio. Supplying a value of lambda overrides this. WARNING: use with care. Do not supply a single value for lambda. Supply instead a decreasing sequence of lambda values. samEL relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit.

nlambda

The number of lambda values. The default value is 20.

lambda.min.ratio

Smallest value for lambda, as a fraction of lambda.max, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). The default is 0.1.

thol

Stopping precision. The default value is 1e-5.

max.ite

The number of maximum iterations. The default value is 1e5.

regfunc

A string indicating the regularizer. The default value is "L1". You can also assign "MCP" or "SCAD" to it.

Value

The number of basis spline functions used in training.

X.min

A vector with each entry corresponding to the minimum of each input variable. (Used for rescaling in testing)

X.ran

A vector with each entry corresponding to the range of each input variable. (Used for rescaling in testing)

lambda

A sequence of regularization parameter used in training.

The solution path matrix (d*p+1 by length of lambda) with each column corresponding to a regularization parameter. Since we use the basis expansion with the intercept, the length of each column is d*p+1.

The degree of freedom of the solution path (The number of non-zero component function)

knots

The p-1 by d matrix. Each column contains the knots applied to the corresponding variable.

Boundary.knots

The 2 by d matrix. Each column contains the boundary points applied to the corresponding variable.

func_norm

The functional norm matrix (d by length of lambda) with each column corresponds to a regularization parameter. Since we have d input variables, the length of each column is d.

Details

We adopt various computational algorithms including the block coordinate descent, fast iterative soft-thresholding algorithm, and newton method. The computation is further accelerated by "warm-start" and "active-set" tricks.

Examples

Run this code

# NOT RUN {
## generating training data
n = 200
d = 100
X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d)
u = exp(-2*sin(X[,1]) + X[,2]^2-1/3 + X[,3]-1/2 + exp(-X[,4])+exp(-1)-1+1)
y = rep(0,n)
for(i in 1:n) y[i] = rpois(1,u[i])

## Training
out.trn = samEL(X,y)
out.trn

## plotting solution path
plot(out.trn)

## generating testing data
nt = 1000
Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d)
ut = exp(-2*sin(Xt[,1]) + Xt[,2]^2-1/3 + Xt[,3]-1/2 + exp(-Xt[,4])+exp(-1)-1+1)
yt = rep(0,nt)
for(i in 1:nt) yt[i] = rpois(1,ut[i])

## predicting response
out.tst = predict(out.trn,Xt)
# }

Run the code above in your browser using DataLab