Learn R Programming

Oscope (version 1.2.0)

OscopeENI: Search for the optimal sample order for different gene clusters

Description

Search for the optimal sample order for different gene clusters

Usage

OscopeENI(KMRes, Data, ClusterUse=NULL, Ndg=3, NChun=4, RdmStart=FALSE, N=20000, NCThre=1000, parallel=FALSE, parallelParam=NULL)

Arguments

KMRes
output of OscopeKM() function.
Data
gene-by-sample matrix or isoform-by-sample matrix. It should be rescaled to values bwteen [-1,1].
ClusterUse
a vector indicating what clusters are of interest. For example, by setting ClusterUse = c(1,2,3), only the top 3 clusters will be considered while recovering the base cycle order. If ClusterUse=NULL, all clusters will be used.
Ndg
degree of polynomial.
NChun
number of starting points for polynomial fitting.
RdmStart
whether the start points are randomly selected.
N,NCThre
The 2-opt algorithm will stop if N iterations has been performed or if the optimal order
parallel
whether apply parallel computing. if it is TRUE, BiocParallel will be called.
parallelParam
a SnowParam object to specify the clusters. If it is NULL, the default will be set as SnowParam(workers = 5, type = "SOCK") remains unchanged for over NCThre iterations.

Value

This function performs the extended nearest insertion (ENI) and 2-opt algorithm to all clusters (or a subset of picked clusters) identified by OscopeKM function. The function will recover independent orders to each of the clusters. For each cluster, the ENI algorithm will be applied to search for the optimal sample order which minimizes the MSE of sliding polynomial regression (SPR). This function will call PipeShiftCDF() function, which fits SPR to expression of each gene/isoform within a cluster. For each gene/isoform, SPR fits NChun polynomial curves with different starting points (samples). The samples with smaller order than the start point will be appended to follow the last sample when fitting. So each fitting consider same number of samples. If RdmStart = TRUE, the start points are randomly selected. Otherwise they are evenly sampled along the sample order. The aggregated MSE of a fit (using a specific start point) is defined as the summation of the MSEs of all genes/isoforms considered here. The MSE of the SPR is defined as the largest aggregated MSE across fits using different start points. The output of PipeShiftCDF() returns the optimal order which provides the smallest SPR MSE. The 2-opt algorithm will then be applied to improve the optimal order searching of the ENI. In each iteration, the 2-opt algorithm will randomly choose two points (samples), the flip the samples between these two points. The new order will be adapted if it provides smaller SPR MSE. The output returns the optimal order for each cluster of interest. It is a list with multiple sublists, in which each sublist includes the recovered order of the corresponding cluster in ClusterUse. If ClusterUse is not specified, the k th sublist shows the recovered order in KMRes

Examples

Run this code
aa <- sin(seq(0,1,.1))
bb <- sin(seq(0.5,1.5,.1))
cc <- sin(seq(0.9,1.9,.1))
dd <- sin(seq(1.2,2.2,.1))
res <- OscopeENI(list(c1=c("aa","bb"),c2=c("cc","dd")), rbind(aa,bb,cc,dd), NChun=2, N=50)

Run the code above in your browser using DataLab