getCurves: Construct Simultaneous Principal Curves

Description

This function constructs simultaneous principal curves, the second step in Slingshot's trajectory inference procedure. It takes a (specifically formatted) PseudotimeOrdering object, as is returned by the first step, getLineages. The output is another PseudotimeOrdering object, containing the simultaneous principal curves, pseudotime estimates, and lineage assignment weights.

Usage

getCurves(data, ...)
# S4 method for PseudotimeOrdering
getCurves(
  data,
  shrink = TRUE,
  extend = "y",
  reweight = TRUE,
  reassign = TRUE,
  thresh = 0.001,
  maxit = 15,
  stretch = 2,
  approx_points = NULL,
  smoother = "smooth.spline",
  shrink.method = "cosine",
  allow.breaks = TRUE,
  ...
)
# S4 method for SingleCellExperiment
getCurves(data, ...)
# S4 method for SlingshotDataSet
getCurves(data, ...)

Arguments

data

a data object containing lineage information provided by getLineages, to be used for constructing simultaneous principal curves. Supported types include SingleCellExperiment, SlingshotDataSet, and PseudotimeOrdering (recommended).

...

Additional parameters to pass to scatter plot smoothing function, smoother.

shrink

logical or numeric between 0 and 1, determines whether and how much to shrink branching lineages toward their average prior to the split (default = TRUE).

extend

character, how to handle root and leaf clusters of lineages when constructing the initial, piece-wise linear curve. Accepted values are 'y' (default), 'n', and 'pc1'. See 'Details' for more.

reweight

logical, whether to allow cells shared between lineages to be reweighted during curve fitting. If TRUE (default), cells shared between lineages will be iteratively reweighted based on the quantiles of their projection distances to each curve. See 'Details' for more.

reassign

logical, whether to reassign cells to lineages at each iteration. If TRUE (default), cells will be added to a lineage when their projection distance to the curve is less than the median distance for all cells currently assigned to the lineage. Additionally, shared cells will be removed from a lineage if their projection distance to the curve is above the 90th percentile and their weight along the curve is less than 0.1.

thresh

numeric, determines the convergence criterion. Percent change in the total distance from cells to their projections along curves must be less than thresh. Default is 0.001, similar to principal_curve.

maxit

numeric, maximum number of iterations (default = 15), see principal_curve.

stretch

numeric factor by which curves can be extrapolated beyond endpoints. Default is 2, see principal_curve.

approx_points

numeric, whether curves should be approximated by a fixed number of points. If FALSE (or 0), no approximation will be performed and curves will contain as many points as the input data. If numeric, curves will be approximated by this number of points (default = 150 or #cells, whichever is smaller). See 'Details' and principal_curve for more.

smoother

choice of scatter plot smoother. Same as principal_curve, but "lowess" option is replaced with "loess" for additional flexibility.

shrink.method

character denoting how to determine the appropriate amount of shrinkage for a branching lineage. Accepted values are the same as for kernel in density (default is "cosine"), as well as "tricube" and "density". See 'Details' for more.

allow.breaks

logical, determines whether curves that branch very close to the origin should be allowed to have different starting points.

Value

An updated PseudotimeOrdering object containing the pseudotime estimates and lineage assignment weights in the assays. It will also include the original information provided by getLineages, as well as the following new elements in the metadata:

curves A list of principal_curve objects.
slingParams Additional parameters used for fitting simultaneous principal curves.

Details

This function constructs simultaneous principal curves (one per lineage). Cells are mapped to curves by orthogonal projection and pseudotime is estimated by the arclength along the curve (also called lambda, in the principal_curve objects).

When there is only a single lineage, the curve-fitting algorithm is nearly identical to that of principal_curve. When there are multiple lineages and shrink > 0, an additional step is added to the iterative procedure, forcing curves to be similar in the neighborhood of shared points (ie., before they branch).

The approx_points argument, which sets the number of points to be used for each curve, can have a large effect on computation time. Due to this consideration, we set the default value to 150 whenever the input dataset contains more than that many cells. This setting should help with exploratory analysis while having little to no impact on the final curves. To disable this behavior and construct curves with the maximum number of points, set approx_points = FALSE.

The extend argument determines how to construct the piece-wise linear curve used to initiate the recursive algorithm. The initial curve is always based on the lines between cluster centers and if extend = 'n', this curve will terminate at the center of the endpoint clusters. Setting extend = 'y' will allow the first and last segments to extend beyond the cluster center to the orthogonal projection of the furthest point. Setting extend = 'pc1' is similar to 'y', but uses the first principal component of the cluster to determine the direction of the curve beyond the cluster center. These options typically have limited impact on the final curve, but can occasionally help with stability issues.

When shink = TRUE, we compute a percent shrinkage curve, \(w_l(t)\), for each lineage, a non-increasing function of pseudotime that determines how much that lineage should be shrunk toward a shared average curve. We set \(w_l(0) = 1\) (complete shrinkage), so that the curves will always perfectly overlap the average curve at pseudotime 0. The weighting curve decreases from 1 to 0 over the non-outlying pseudotime values of shared cells (where outliers are defined by the 1.5*IQR rule). The exact shape of the curve in this region is controlled by shrink.method, and can follow the shape of any standard kernel function's cumulative density curve (or more precisely, survival curve, since we require a decreasing function). Different choices of shrink.method to have no discernable impact on the final curves, in most cases.

When reweight = TRUE, weights for shared cells are based on the quantiles of their projection distances onto each curve. The distances are ranked and converted into quantiles between 0 and 1, which are then transformed by 1 - q^2. Each cell's weight along a given lineage is the ratio of this value to the maximum value for this cell across all lineages.

References

Hastie, T., and Stuetzle, W. (1989). "Principal Curves." Journal of the American Statistical Association, 84:502--516.

Examples

Run this code

# NOT RUN {
data("slingshotExample")
rd <- slingshotExample$rd
cl <- slingshotExample$cl
pto <- getLineages(rd, cl, start.clus = '1')
pto <- getCurves(pto)

# plotting
sds <- as.SlingshotDataSet(pto)
plot(rd, col = cl, asp = 1)
lines(sds, type = 'c', lwd = 3)

# }

Run the code above in your browser using DataLab