The basis function $f_y$ is a vector-valued function of the response $y \in R$. There is an infinite number of basis functions, including the polynomial, piecewise polynomial, and Fourier.
We implemented the following:1. Polynomial basis: $f_y=(y, y^2, ..., y^r)^T$. It corresponds to the "poly"
argument of bf
. The argument degree
is $r$ of the polynomial is provided by the user. The subsequent $n \times r$ data-matrix is column-wise centered.
2. Piecewise constant basis: It corresponds to pdisc
with degree=0
. It is obtained by first slicing the range of $y$ into $h$ slices $H_1,...,H_k$. The $k^{th}$ component of $f_y \in \mathrm{R}^{h-1}$ is $f_{y_k}=J(y \in H_k)-n_k/n, k=1, ..., h-1$, where $n_y$ is the number of observations in $H_k$, and $J$ is the indicator function. We suggest using between two and fifteen slices without exceeding $n/5$.
3. Piecewise discontinuous linear basis: It corresponds to "pdisc"
with degree=1
. It is more elaborate than the piecewise constant basis. A linear function of $y$ is fit within each slice. Let $\tau_i$ be the knots, or endpoints of the slices. The components of $f_y \in \mathrm{R}^{2h-1}$ are obtained with
$f_{y_{(2i-1)}} = J(y \in H_i)$; $f_{y_{2i}} = J(y \in H_i)(y-\tau_{i-1})$ for $i=1,2,...,h-1$ and
$f_{y_{(2h-1)}} = J(y \in H_{h})(y-\tau_{h-1})$. The subsequent $n \times (2h-1)$ data-matrix is column-wise centered. We suggest using fewer than fifteen slices without exceeding $n/5$.
4. Piecewise continuous linear basis: The general form of the components $f_{y_i}$ of
$f_y \in \mathrm{R}^{h+1}$ is given by $f_{y_1} = J(y \in H_1)$ and
$f_{y_{i+1}} = J(y \in H_{i})(y-\tau_{i-1})$ for $i=1,...,h.$. The subsequent $n \times (h-1)$ data-matrix is column-wise centered.
This case corresponds to "pcont"
with degree=1. The number of slices to use may not exceed $n/5$.
5. Fourier bases: They consist of a series of pairs of sines and cosines of increasing frequency.
A Fourier basis is given by $f_y=(\cos(2\pi y), \sin(2\pi y),..., \cos(2\pi ky), \sin(2\pi ky))^T.$
The subsequent $n \times 2k$ data-matrix is column-wise centered.
6. Categorical basis: It is obtained using "categ"
option when $y$ takes $h$ distinct values $1, 2,..., h$, corresponding to the number of sub-populations or sub-groups. The number of slices is naturally $h$. The expression for the basis is identical to piecewise constant basis.
In all cases, the basis must be constructed such that $F^TF$ is invertible, where $F$ is the $n \times r$ data-matrix with its $i$th row being $f_y$.