ROKU(data, upper.limit = 0.25, sort = FALSE)
TRUE
, results are sorted in descending
order of the entropy scores.data
are data frame
or matrix. A numeric vector when the input data
are numeric
vector.
Both matrix or vector consist of 1, -1, and 0: 1 for over-expressed
outliers, -1 for under-expressed outliers, and 0 for non-outliers.data
are data frame or
matrix. A numeric scalar when the input data
are numeric
vector. Both vector or scalar consist of original entropy ($H$)
score(s) calculated from an original gene expression vector.data
are data frame or
matrix. A numeric scalar when the input data
are numeric
vector. Both vector or scalar consist of modified entropy ($H'$)
score(s) calculated from a processed gene expression vector.modH
.tukey.biweight
with default parameter settings
in affy
package. The data processing is done by
subtracting this value for each gene expression vector and by
taking the absolute value.Note that the modified entropy does not explain to which tissue a gene is
specific, only measuring the degree of overall tissue specificity of the gene.
ROKU employs an AIC-based outlier detection method (Ueda, 1996).
Consider, for example, a hypothetical mixed-type of tissue-selective expression
pattern $(1.2, 5.1, 5.2, 5.4, 5.7, 5.9, 6.0, 6.3, 8.5, 8.8)$ where we
imagine a total of three tissues are specific (down-regulated in tissue1;
up-regulated in tissues 9 and 10). The method first normalize the expression
values by subtracting the mean and dividing by the standard deviation
(i.e., $z$-score transformation), then sorted in order of increasing
magnitude by
$(-2.221, -0.342, -0.294, -0.198, -0.053, 0.043, 0.092, 0.236, 1.296,
1.441)$. The method evaluates various combinations of outlier candidates
starting from both sides of the values: model1 for non-outlier,
model2 for one outlier for high-side, model3 for two outliers for high-side,
..., model$x$ for one outlier for down-side, ..., modely for two outliers for
both up- and down sides, and so on. Then, it calculates AIC-like statistic
(called $U$) for each combination of model and search the best combination
that achieves the lowest $U$ value and is termed the minimum AIC estimate
(MAICE). Since the upper.limit value corresponds to the maximum number of the
outlier candidates, it decides the number of combinations. The AIC-based
method output a vector (1 for up-regulated outliers, -1 for down-regulated
outliers, and 0 for non-outliers) that corresponds to the input vector.
For example, the method outputs a vector $(-1, 0, 0, 0, 0, 0, 0, 0, 1, 1)$
when using upper.limit = 0.5
and $(-1, 0, 0, 0, 0, 0, 0, 0, 0, 0)$
when using upper.limit = 0.25
(as default).
See the Kadota et al., 2007 for detailed discussion about the effect of
different parameter settings.
Kadota K, Ye J, Nakai Y, Terada T, Shimizu K: ROKU: a novel method for identification of tissue-specific genes. BMC Bioinformatics 2006, 7: 294.
Kadota K, Nishimura SI, Bono H, Nakamura S, Hayashizaki Y, Okazaki Y, Takahashi K: Detection of genes with tissue-specific expression patterns using Akaike's Information Criterion (AIC) procedure. Physiol Genomics 2003, 12: 251-259.
Ueda T. Simple method for the detection of outliers. Japanese J Appl Stat 1996, 25: 17-26.
data(hypoData_ts)
result <- ROKU(hypoData_ts)
Run the code above in your browser using DataLab