ImpShift: Search for the optimal sample order by using the Extended Nearest Insertion

Description

Search for the optimal sample order by using the Extended Nearest Insertion

Usage

ImpShift(Data, Seq=NULL, NChun=4, RdmStart=FALSE, Ndg=3)

Arguments

Data

gene-by-sample matrix or isoform-by-sample matrix.It should be rescaled to values bwteen [-1,1].

Seq

NULL or a vector indicates the sample order. if specified, the samples will be first reordered by this vector.

NChun

number of starting points for polynomial fitting.

RdmStart

whether the start points are randomly selected.

Ndg

degree of polynomial.

Value

This function performs the extended nearest insertion (ENI). The ENI algorithm searchs for the optimal sample order which minimizes the MSE of sliding polynomial regression (SPR). This function will call PipeShiftCDF() function, which fits SPR to each row of the data. For each gene/isoform, SPR fits NChun polynomial curves with different starting points (samples). The samples with smaller order than the start point will be appended to follow the last sample when fitting. So each fitting consider same number of samples. If RdmStart = TRUE, the start points are randomly selected. Otherwise they are evenly sampled along the sample order. The aggregated MSE of a fit (using a specific start point) is defined as the summation of the MSEs of all genes/isoforms considered here. The MSE of the SPR is defined as the largest aggregated MSE across fits using different start points. The output returns the optimal order which provides the smallest SPR MSE.

Examples

Run this code

aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
res <- ImpShift(rbind(aa,bb,cc), NChun=2)

Run the code above in your browser using DataLab