In the global enviroment two variables have to be set up: geData and
stData. geData is a matrix whose columns are the gene expressions and the rows
are the samples (see geNSCLC for example). It is recommended that the columns
names are instantiated. stData is a variable of the "Surv" class from the
package "survival" (see stNSCLG for example).Starting from the seed gene (a list of seeds is allowed), the next gene added is the one that maximizes the distance of the two survival curves. The list of genes grows until no more gene is able to improve the distance between the survival curves.
A gene (candidateGene) can be added to the running signature if it satisfies two
controls: given the classification computed on the gene expressions of
geneCandidate + runningSignature, 1) no cluster can have a dimension lower than
floor(0.1 * nrow(geData)), and 2) the survival curves cannot cross. When more
than 1 candidate gene is proposed, if the number of candidates is greater than
0.01*ncol(geData) the searching stops; otherwise a subset of the candidates is
selected using backward strategy.
The parameter coeffMissingAllowed controls an empirical rule having in charge to prevent the crash of the pam() function. The number of joint missing values allowed in a sample described by p gene expression levels is given by floor(p^coeffMissingAllowed).