fsim.kernel.fit: Functional single-index model fit using kernel estimation and joint LOOCV minimisation

Description

This function fits a functional single-index model (FSIM) between a functional covariate and a scalar response. It employs kernel estimation with Nadaraya-Watson weights and uses B-spline expansions to represent curves and eligible functional indexes.

The function also utilises the leave-one-out cross-validation (LOOCV) criterion to select the bandwidth (h.opt) and the coefficients of the functional index in the spline basis (theta.est). It performs a joint minimisation of the LOOCV objective function in both the bandwidth and the functional index.

Usage

fsim.kernel.fit(x, y, seed.coeff = c(-1, 0, 1), order.Bspline = 3, 
nknot.theta = 3,  min.q.h = 0.05, max.q.h = 0.5, h.seq = NULL, num.h = 10, 
kind.of.kernel = "quad", range.grid = NULL, nknot = NULL, n.core = NULL)

Value

call: The matched call.
fitted.values: Estimated scalar response.
residuals: Differences between y and the fitted.values.
theta.est: Coefficients of $\hat{\theta}$ in the B-spline basis: a vector of length(order.Bspline+nknot.theta).
h.opt: Selected bandwidth.
r.squared: Coefficient of determination.
var.res: Redidual variance.
df: Residual degrees of freedom.
yhat.cv: Predicted values for the scalar response using leave-one-out samples.
CV.opt: Minimum value of the CV function, i.e. the value of CV for theta.est and h.opt.
CV.values: Vector containing CV values for each functional index in $\Theta_n$ and the value of $h$ that minimises the CV for such index (i.e. CV.values[j] contains the value of the CV function corresponding to theta.seq.norm[j,] and the best value of the h.seq for this functional index according to the CV criterion).
H: Hat matrix.
m.opt: Index of $\hat{\theta}$ in the set $\Theta_n$.
theta.seq.norm: The vector theta.seq.norm[j,] contains the coefficientes in the B-spline basis of the jth functional index in $\Theta_n$.
h.seq: Sequence of eligible values for $h$.
...

Arguments

x: Matrix containing the observations of the functional covariate (i.e. curves) collected by row.
y: Vector containing the scalar response.
seed.coeff: Vector of initial values used to build the set $\Theta_n$ (see section Details). The coefficients for the B-spline representation of each eligible functional index $\theta \in \Theta_n$ are obtained from seed.coeff. The default is c(-1,0,1).
order.Bspline: Positive integer giving the order of the B-spline basis functions. This is the number of coefficients in each piecewise polynomial segment. The default is 3
nknot.theta: Positive integer indicating the number of regularly spaced interior knots in the B-spline expansion of $\theta_0$. The default is 3.
min.q.h: Minimum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the lower endpoint of the range from which the bandwidth is selected. The default is 0.05.
max.q.h: Maximum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the upper endpoint of the range from which the bandwidth is selected. The default is 0.5.
h.seq: Vector containing the sequence of bandwidths. The default is a sequence of num.h equispaced bandwidths in the range constructed using min.q.h and max.q.h.
num.h: Positive integer indicating the number of bandwidths in the grid. The default is 10.
kind.of.kernel: The type of kernel function used. Currently, only Epanechnikov kernel ("quad") is available.
range.grid: Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate x are evaluated (i.e. the range of the discretisation). If range.grid=NULL, then range.grid=c(1,p) is considered, where p is the discretisation size of x (i.e. ncol(x)).
nknot: Positive integer indicating the number of interior knots for the B-spline expansion of the functional covariate. The default value is (p - order.Bspline - 1)%/%2.
n.core: Number of CPU cores designated for parallel execution.The default is n.core<-availableCores(omit=1).

Author

German Aneiros Perez german.aneiros@udc.es

Silvia Novo Diaz snovo@est-econ.uc3m.es

Details

The functional single-index model (FSIM) is given by the expression: $$Y_i=r(\langle\theta_0,X_i\rangle)+\varepsilon_i, \quad i=1,\dots,n,$$ where $Y_i$ denotes a scalar response, $X_i$ is a functional covariate valued in a separable Hilbert space $\mathcal{H}$ with an inner product $\langle \cdot, \cdot\rangle$. The term $\varepsilon$ denotes the random error, $\theta_0 \in \mathcal{H}$ is the unknown functional index and $r(\cdot)$ denotes the unknown smooth link function.

The FSIM is fitted using the kernel estimator $$ \widehat{r}_{h,\hat{\theta}}(x)=\sum_{i=1}^nw_{n,h,\hat{\theta}}(x,X_i)Y_i, \quad \forall x\in\mathcal{H}, $$ with Nadaraya-Watson weights $$ w_{n,h,\hat{\theta}}(x,X_i)=\frac{K\left(h^{-1}d_{\hat{\theta}}\left(X_i,x\right)\right)}{\sum_{i=1}^nK\left(h^{-1}d_{\hat{\theta}}\left(X_i,x\right)\right)}, $$ where

the real positive number $h$ is the bandwidth.
$K$ is a kernel function (see the argument kind.of.kernel).
$d_{\hat{\theta}}(x_1,x_2)=|\langle\hat{\theta},x_1-x_2\rangle|$ is the projection semi-metric, and $\hat{\theta}$ is an estimate of $\theta_0$.

The procedure requires the estimation of the function-parameter $\theta_0$. Therefore, we use B-spline expansions to represent curves (dimension nknot+order.Bspline) and eligible functional indexes (dimension nknot.theta+order.Bspline). Then, we build a set $\Theta_n$ of eligible functional indexes by calibrating (to ensure the identifiability of the model) the set of initial coefficients given in seed.coeff. The larger this set is, the greater the size of $\Theta_n$. Since our approach requires intensive computation, a trade-off between the size of $\Theta_n$ and the performance of the estimator is necessary. For that, Ait-Saidi et al. (2008) suggested considering order.Bspline=3 and seed.coeff=c(-1,0,1). For details on the construction of $\Theta_n$, see Novo et al. (2019).

We obtain the estimated coefficients of $\theta_0$ in the spline basis (theta.est) and the selected bandwidth (h.opt) by minimising the LOOCV criterion. This function performs a joint minimisation in both parameters, the bandwidth and the functional index, and supports parallel computation. To avoid parallel computation, we can set n.core=1.

References

Ait-Saidi, A., Ferraty, F., Kassa, R., and Vieu, P. (2008) Cross-validated estimations in the single-functional index model. Statistics, 42(6), 475--494, tools:::Rd_expr_doi("https://doi.org/10.1080/02331880801980377").

Novo S., Aneiros, G., and Vieu, P., (2019) Automatic and location-adaptive estimation in functional single--index regression. Journal of Nonparametric Statistics, 31(2), 364--392, tools:::Rd_expr_doi("https://doi.org/10.1080/10485252.2019.1567726").

Examples

Run this code

# \donttest{
data(Tecator)
y<-Tecator$fat
X<-Tecator$absor.spectra2

#FSIM fit.
ptm<-proc.time()
fit<-fsim.kernel.fit(y[1:160],x=X[1:160,],max.q.h=0.35, nknot=20,
range.grid=c(850,1050),nknot.theta=4)
proc.time()-ptm
fit
names(fit)
# }

Run the code above in your browser using DataLab