fsim.kernel.fit.optim: Functional single-index model fit using kernel estimation and iterative LOOCV minimisation

Description

This function fits a functional single-index model (FSIM) between a functional covariate and a scalar response. It employs kernel estimation with Nadaraya-Watson weights and uses B-spline expansions to represent curves and eligible functional indexes.

The function also utilises the leave-one-out cross-validation (LOOCV) criterion to select the bandwidth (h.opt) and the coefficients of the functional index in the spline basis (theta.est). It performs an iterative minimisation of the LOOCV objective function, starting from an initial set of coefficients (gamma) for the functional index.

Usage

fsim.kernel.fit.optim(x, y, nknot.theta = 3, order.Bspline = 3, gamma = NULL, 
min.q.h = 0.05, max.q.h = 0.5, h.seq = NULL, num.h = 10,
kind.of.kernel = "quad", range.grid = NULL, nknot = NULL, threshold = 0.005)

Value

call: The matched call.
fitted.values: Estimated scalar response.
residuals: Differences between y and the fitted.values.
theta.est: Coefficients of $\hat{\theta}$ in the B-spline basis: a vector of length(order.Bspline+nknot.theta).
h.opt: Selected bandwidth.
r.squared: Coefficient of determination.
var.res: Redidual variance.
df: Residual degrees of freedom.
CV.opt: Minimum value of the LOOCV function, i.e. the value of LOOCV for theta.est and h.opt.
err: Value of the LOOCV function divided by var(y) for each interaction.
H: Hat matrix.
h.seq: Sequence of eligible values for the bandwidth.
CV.hseq: CV values for each h.
...

Arguments

x: Matrix containing the observations of the functional covariate (i.e. curves) collected by row.
y: Vector containing the scalar response.
order.Bspline: Positive integer giving the order of the B-spline basis functions. This is the number of coefficients in each piecewise polynomial segment. The default is 3
nknot.theta: Positive integer indicating the number of regularly spaced interior knots in the B-spline expansion of $\theta_0$. The default is 3.
gamma: Vector indicating the initial coefficients for the functional index used in the iterative procedure. By default, it is a vector of ones. The size of the vector is determined by the sum nknot.theta+order.Bspline.
min.q.h: Minimum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the lower endpoint of the range from which the bandwidth is selected. The default is 0.05.
max.q.h: Maximum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the upper endpoint of the range from which the bandwidth is selected. The default is 0.5.
h.seq: Vector containing the sequence of bandwidths. The default is a sequence of num.h equispaced bandwidths in the range constructed using min.q.h and max.q.h.
num.h: Positive integer indicating the number of bandwidths in the grid. The default is 10.
kind.of.kernel: The type of kernel function used. Currently, only Epanechnikov kernel ("quad") is available.
range.grid: Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate x are evaluated (i.e. the range of the discretisation). If range.grid=NULL, then range.grid=c(1,p) is considered, where p is the discretisation size of x (i.e. ncol(x)).
nknot: Positive integer indicating the number of regularly spaced interior knots for the B-spline expansion of the functional covariate. The default value is (p - order.Bspline - 1)%/%2.
threshold: The convergence threshold for the LOOCV function (scaled by the variance of the response). The default is 5e-3.

Author

German Aneiros Perez german.aneiros@udc.es

Silvia Novo Diaz snovo@est-econ.uc3m.es

Details

The functional single-index model (FSIM) is given by the expression: $$Y_i=r(\langle\theta_0,X_i\rangle)+\varepsilon_i, \quad i=1,\dots,n,$$ where $Y_i$ denotes a scalar response, $X_i$ is a functional covariate valued in a separable Hilbert space $\mathcal{H}$ with an inner product $\langle \cdot, \cdot\rangle$. The term $\varepsilon$ denotes the random error, $\theta_0 \in \mathcal{H}$ is the unknown functional index and $r(\cdot)$ denotes the unknown smooth link function.

The FSIM is fitted using the kernel estimator $$ \widehat{r}_{h,\hat{\theta}}(x)=\sum_{i=1}^nw_{n,h,\hat{\theta}}(x,X_i)Y_i, \quad \forall x\in\mathcal{H}, $$ with Nadaraya-Watson weights $$ w_{n,h,\hat{\theta}}(x,X_i)=\frac{K\left(h^{-1}d_{\hat{\theta}}\left(X_i,x\right)\right)}{\sum_{i=1}^nK\left(h^{-1}d_{\hat{\theta}}\left(X_i,x\right)\right)}, $$ where

the real positive number $h$ is the bandwidth.
$K$ is a kernel function (see the argument kind.of.kernel).
$d_{\hat{\theta}}(x_1,x_2)=|\langle\hat{\theta},x_1-x_2\rangle|$ is the projection semi-metric, and $\hat{\theta}$ is an estimate of $\theta_0$.

The procedure requires the estimation of the function-parameter $\theta_0$. Therefore, we use B-spline expansions to represent curves (dimension nknot+order.Bspline) and eligible functional indexes (dimension nknot.theta+order.Bspline). We obtain the estimated coefficients of $\theta_0$ in the spline basis (theta.est) and the selected bandwidth (h.opt) by minimising the LOOCV criterion. This function performs an iterative minimisation procedure, starting from an initial set of coefficients (gamma) for the functional index. Given a functional index, the optimal bandwidth according to the LOOCV criterion is selected. For a given bandwidth, the minimisation in the functional index is performed using the R function optim. The procedure is iterated until convergence. For details, see Ferraty et al. (2013).

References

Ferraty, F., Goia, A., Salinelli, E., and Vieu, P. (2013) Functional projection pursuit regression. Test, 22, 293--320, tools:::Rd_expr_doi("https://doi.org/10.1007/s11749-012-0306-2").

Novo S., Aneiros, G., and Vieu, P., (2019) Automatic and location-adaptive estimation in functional single--index regression. Journal of Nonparametric Statistics, 31(2), 364--392, tools:::Rd_expr_doi("https://doi.org/10.1080/10485252.2019.1567726").

Examples

Run this code

# \donttest{
data(Tecator)
y<-Tecator$fat
X<-Tecator$absor.spectra2

#FSIM fit.
ptm<-proc.time()
fit<-fsim.kernel.fit.optim(y[1:160],x=X[1:160,],max.q.h=0.35, nknot=20,
range.grid=c(850,1050),nknot.theta=4)
proc.time()-ptm
fit
names(fit)
# }

Run the code above in your browser using DataLab