splitSplines: Adds the fits after fitting a natural cubic smoothing spline to subsets of a response to a `data.frame`

Description

Uses fitSpline to fit a spline to a subset of the values of response and stores the fitted values in data. The subsets are those values with the same levels combinations of the factors listed in INDICES and the degrees of smoothing is controlled by df. The derivatives of the fitted spline can also be obtained, as can the Relative Growth Rates (RGR).

By default, smooth.spline will issue an error if there are not at least four distinct x-values. On the other hand, fitSpline issues a warning and sets all smoothed values and derivatives to NA. The handling of missing values in the observations is controlled via na.x.action and na.y.action.

Usage

splitSplines(data, response, x, INDICES, df = NULL, smoothing.scale = "identity", 
             correctBoundaries = FALSE, 
             deriv = NULL, suffices.deriv=NULL, RGR=NULL, AGR=NULL, sep=".", 
             na.x.action="exclude", na.y.action = "exclude", ...)

Value

A data.frame containing data to which has been added a column with the fitted smooth, the name of the column being

response with .smooth appended to it. If deriv is not NULL, columns containing the values of the derivative(s) will be added to data; the name each of these columns will be the value of response with .smooth.dvf appended, where f is the order of the derivative, or the value of

response with .smooth. and the corresponding element of suffices.deriv appended. If RGR is not

NULL, the RGR is calculated as the ratio of value of the first derivative of the fitted spline and the fitted value for the spline. Any pre-existing smoothed and derivative columns in data will be replaced. The ordering of the data.frame for the x

values will be preserved as far as is possible; the main difficulty is with the handling of missing values by the function merge. Thus, if missing values in x are retained, they will occur at the bottom of each subset of INDICES and the order will be problematic when there are missing values in y and

na.y.action is set to omit.

Arguments

data: A data.frame containing the column to be smoothed.
response: A character giving the name of the column in data that is to be smoothed.
x: A character giving the name of the column in data that contains the values of the predictor variable.
INDICES: A character giving the name(s) of the factor(s) that define the subsets of response that are to be smoothed separately. If the columns corresponding to INDICES are not factor(s) then they will be coerced to factor(s). The subsets are formed using split.
df: A numeric specifying the desired equivalent number of degrees of freedom of the smooth (trace of the smoother matrix). Lower values result in more smoothing. If df = NULL, ordinary leave-one-out cross-validation is used to determine the amount of smooth.
smoothing.scale: A character giving the scale on which smoothing is to be performed. The two possibilites are "identity", for directly smoothing the observed response, and "logarithmic", for scaling the log-transformed response.
correctBoundaries: A logical indicating whether the fitted spline values are to have the method of Huang (2001) applied to them to correct for estimation bias at the end-points. Note that deriv must be NULL for correctBoundaries to be set to TRUE.
deriv: A numeric specifying one or more orders of derivatives that are required.
suffices.deriv: A character giving the characters to be appended to the names of the derivatives.
RGR: A character giving the character to be appended to the smoothed response to create the RGR name, but only when smoothing.scale is identity. When smoothing.scale is identity: (i) if RGR is not NULL deriv must include 1 so that the first derivative is available for calculating the RGR; (ii) if RGR is NULL, the RGR is not calculated from the AGR. When smoothing.scale is logarithmic, the RGR is the backtransformed first derivative and so, to obtain it, merely include 1 in deriv and any suffix for it in suffices.deriv.
AGR: A character giving the character to be appended to the smoothed response to create the AGR name, but only when smoothing.scale is logarithmic. When smoothing.scale is logarithmic: (i) if AGR is not NULL, deriv must include 1 so that the first derivative is available for calculating the AGR; (ii) If AGR is NULL, the AGR is not calculated from the RGR. When smoothing.scale is identity, the AGR is the first derivative and so, to obtain it, merely include 1 in deriv and any suffix for it in suffices.deriv.
sep: A character giving the separator to use when the levels of INDICES are combined. This is needed to avoid using a character that occurs in a factor to delimit levels when the levels of INDICES are combined to identify subsets.
na.x.action: A character string that specifies the action to be taken when values of x are NA. The possible values are fail, exclude or omit. For exclude and omit, predictions and derivatives will only be obtained for nonmissing values of x. The difference between these two codes is that for exclude the returned data.frame will have as many rows as data, the missing values have been incorporated.
na.y.action: A character string that specifies the action to be taken when values of y, or the response, are NA. The possible values are fail, exclude, omit, allx, trimx, ltrimx or rtrimx. For all options, except fail, missing values in y will be removed before smoothing. For exclude and omit, predictions and derivatives will be obtained only for nonmissing values of x that do not have missing y values. Again, the difference between these two is that, only for exclude will the missing values be incorporated into the returned data.frame. For allx, predictions and derivatives will be obtained for all nonmissing x. For trimx, they will be obtained for all nonmissing x between the first and last nonmissing y values that have been ordered for x; for ltrimx and utrimx either the lower or upper missing y values, respectively, are trimmed.
...: allows for arguments to be passed to smooth.spline.

Author

Chris Brien

References

Huang, C. (2001). Boundary corrected cubic smoothing splines. Journal of Statistical Computation and Simulation, 70, 107-121.

Examples

Run this code

data(exampleData)
longi.dat <- splitSplines(longi.dat, response="Area", x="xDays", 
                          INDICES = "Snapshot.ID.Tag", 
                          df = 4, deriv=1, suffices.deriv="AGRdv", RGR="RGRdv")

Run the code above in your browser using DataLab