basis
.lmmSpline(data, time, sampleID, timePredict, deri, basis, knots, keepModels,numCores)
data.frame
or matrix
containing the samples as rows and features as columnsnumeric
vector containing the sample time point information.character
, numeric
or factor
vector containing information about the unique identity of each samplenumeric
vector containing the time points to be predicted. By default set to the original time points observed in the experiment.logical
value. If TRUE
returns the predicted derivative information on the observed time points.By default set to FALSE
.character
string. What type of basis to use, matching one of "cubic"
, "p-spline"
or "cubic p-spline"
. The "cubic"
basis (default
) is the cubic smoothing spline as defined by Verinteger
, the number of knots used for the "p-spline"
or "cubic p-spline"
basis calculation. Otherwise calculated as proposed by Ruppert 2002. Not used for the "cubic" smoothing spline basis as it uselogical
value if you want to keep the model output. Default value is FALSEnumeric
value indicating the number of CPU cores to be used. Default value is automatically estimated.modelsUsed
=0) assumes the response is a straight line not affected by individual variation.
Let $y_{ij}(t_{ij})$ be the expression of a feature for individual (or biological replicate) $i$ at time $t_{ij}$, where $i=1,2,...,n$, $j=1,2,...,m_i$, $n$ is the sample size and $m_i$ is the number of observations for individual $i$ for the given feature.
We fit a simple linear regression of expression $y_{ij}(t_{ij})$ on time $t_{ij}$.
The intercept $\beta_0$ and slope $\beta_1$ are estimated via ordinary least squares:
$y_{ij}(t_{ij})= \beta_0 + \beta_1 t_{ij} + \epsilon_{ij}$, where $\epsilon_{ij} ~ N(0,\sigma^2_{\epsilon}).$
The second model (modelsUsed
=1) is nonlinear where the straight line in regression replaced with a curve modelled using here for example a spline truncated line basis (basis
="p-spline") as proposed Durban et al. 2005:
$$y_{ij}(t_{ij})= f(t_{ij}) +\epsilon_{ij},$$
where $\epsilon_{ij}~ N(0,\sigma_{\epsilon}^2).$
The penalized spline is represented by $f$, which depends on a set of knot positions $\kappa_1,...,\kappa_K$ in the range of ${t_{ij}}$, some unknown coefficients $u_k$, an intercept $\beta_0$ and a slope $\beta_1$. The first term in the above equation can therefore be expanded as:
$$f(t_{ij})= \beta_0+ \beta_1t_{ij}+\sum\limits_{k=1}^{K}u_k(t_{ij}-\kappa_k)_+,$$
with $(t_{ij}-\kappa_k)_+=t_{ij}-\kappa_k$, if $t_{ij}-\kappa_k > 0, 0$ otherwise.
The choice of the number of knots $K$ and their positions influences the flexibility of the curve.
If the argument knots
=missing, we use a method proposed by Ruppert 2002 to estimate the number of knots given the measured number of time points $T$, so that the knots $\kappa_1 \ldots \kappa_K$ are placed at quantiles of the time interval of interest:
$$K= max(5,min(floor(\frac{T}{4}) , 40)).$$
In order to account for individual variation, our third model (modelsUsed
=2) adds a subject-specific random effect $U_i$ to the mean response $f(t_{ij})$.
Assuming $f(t_{ij})$ to be a fixed (yet unknown) population curve, $U_i$ is treated as a random realization of an underlying Gaussian process with zero-mean and variance $\sigma_U^2$ and is independent from the random error $\epsilon_{ij}$:
$$y_{ij}(t_{ij}) = f(t_{ij}) + U_i + \epsilon_{ij}$$
with $U_{i} ~ N(0,\sigma_U^2)$ and $\epsilon_{ij} ~ N(0,\sigma_{\epsilon}^2)$.
In the equation above, the individual curves are expected to be parallel to the mean curve as we assume the individual expression curves to be constant over time.
A simple extension to this model is to assume individual deviations are straight lines. The fourth model (modelsUsed
=3) therefore fits individual-specific random intercepts $a_{i0}$ and slopes $a_{i1}$:
$$y_{ij}(t_{ij}) = f(t_{ij}) + a_{i0} + a_{i1}t_{ij} + \epsilon_{ij}$$
with $\epsilon_{ij} ~ N(0,\sigma_\epsilon^2)$ and $(a_{i0},a_{i1})^T$ ~ $N(0,\Sigma).$
We assume independence between the random intercept and slope.
@return lmmSpline returns an object of class lmmspline
containing the following components:
data.frame
containing predicted values based on linear model object or linear mixed effect model object.}
numeric
vector indicating the model used to fit the data. 0 = linear model, 1=linear mixed effect model spline (LMMS) with defined basis ('cubic' by default) 2 = LMMS taking subject-specific random intercept, 3 = LMMS with subject specific intercept and slope.}
list
of models used to model time profiles.}
logical
value indicating if the predicted values are the derivative information.}summary.lmmspline
, plot.lmmspline
, predict.lmmspline
, deriv.lmmspline
data(kidneySimTimeGroup)
# running for samples in group 1
G1 <- which(kidneySimTimeGroup$group=="G1")
testLMMSpline<- lmmSpline(data=kidneySimTimeGroup$data[G1,],time=kidneySimTimeGroup$time[G1],
sampleID=kidneySimTimeGroup$sampleID[G1])
summary(testLMMSpline)
DerivTestLMMSplineTG<- lmmSpline(data=as.data.frame(kidneySimTimeGroup$data[G1,]),
time=kidneySimTimeGroup$time[G1],sampleID=kidneySimTimeGroup$sampleID[G1],
deri=TRUE,basis="p-spline")
summary(DerivTestLMMSplineTG)
Run the code above in your browser using DataLab