A sleuth is a group of kallistos. Borrowing this terminology, a 'sleuth' object stores a group of kallisto results, and can then operate on them while accounting for covariates, sequencing depth, technical and biological variance.
sleuth_prep(sample_to_covariates, full_model = NULL, target_mapping = NULL,
aggregation_column = NULL, num_cores = max(1L, parallel::detectCores() -
1L), ...)
a data.frame
which contains a mapping
from sample
(a required column) to some set of experimental conditions or
covariates. The column path
is also required, which is a character
vector where each element points to the corresponding kallisto output directory. The column
sample
should be in the same order as the corresponding entry in
path
.
an R formula
which explains the full model (design)
of the experiment OR a design matrix. It must be consistent with the data.frame supplied in
sample_to_covariates
. You can fit multiple covariates by joining them with '+' (see example)
a data.frame
that has at least one column
'target_id' and others that denote the mapping for each target. if it is not
NULL
, target_mapping
is joined with many outputs where it
might be useful. For example, you might have columns 'target_id',
'ensembl_gene' and 'entrez_gene' to denote different transcript to gene
mappings. Note that sleuth_prep will treat all columns as having the 'character' data type.
a string of the column name in target_mapping
to aggregate targets
(typically to summarize the data on the gene level). The aggregation is done using a p-value aggregation
method when generating the results table. See sleuth_results
for more information.
an integer of the number of computer cores mclapply should use to speed up sleuth preparation
any of several other arguments that can be used as advanced options for sleuth preparation. See details.
a sleuth
object containing all kallisto samples, metadata,
and summary statistics
This method takes a list of samples with kallisto results and returns a sleuth
object with the defined normalization of the data across samples (default is the DESeq method;
See basic_filter
), and then the defined transformation of the data (default is log(x + 0.5)).
This also collects all of the bootstraps for the modeling done using sleuth_fit
. This
function also takes several advanced options that can be used to customize your analysis.
Here are the advanced options for sleuth_prep
:
Extra arguments related to Bootstrap Summarizing:
extra_bootstrap_summary
: if TRUE
, compute extra summary
statistics for estimated counts. This is not necessary for typical analyses; it is only needed
for certain plots (e.g. plot_bootstrap
). Default is FALSE
.
read_bootstrap_tpm
: read and compute summary statistics on bootstraps on the TPM.
This is not necessary for typical analyses; it is only needed for some plots (e.g. plot_bootstrap
)
and if TPM values are used for sleuth_fit
. Default is FALSE
.
max_bootstrap
: the maximum number of bootstrap values to read for each
transcript. Setting this lower than the total bootstraps available will save some time, but
will likely decrease the accuracy of the estimation of the inferential noise.
Advanced Options for Filtering:
filter_fun
: the function to use when filtering. This function will be applied to the raw counts
on a row-wise basis, meaning that each feature will be considered individually. The default is to filter out
any features that do not have at least 5 estimated counts in at least 47
for more information). If the preferred filtering method requires a matrix-wide transformation or otherwise
needs to consider multiple features simultaneously instead of independently, please consider using
filter_target_id
below.
filter_target_id
: character vector of target_ids to filter using methods that
can't be implemented using filter_fun
. If non-NULL, this will override filter_fun
.
Advanced Options for the Normalization Step: (NOTE: Be sure you know what you're doing before you use these options)
normalize
: boolean for whether normalization and other steps should be performed.
If this is set to false, bootstraps will not be read and transformation of the data will not be done.
This should only be set to FALSE
if one desires to do a quick check of the raw data.
The default is TRUE
.
norm_fun_counts
: a function to perform between sample normalization on the estimated counts.
The default is the DESeq method. See norm_factors
for details.
norm_fun_tpm
: a function to perform between sample normalization on the TPM.
The default is the DESeq method. See norm_factors
for details.
Advanced Options for the Transformation Step: (NOTE: Be sure you know what you're doing before you use these options)
transform_fun_counts
: the transformation that should be applied
to the normalized counts. Default is 'log(x+0.5)'
(i.e. natural log with 0.5 offset).
transform_fun_tpm
: the transformation that should be applied
to the TPM values. Default is 'x'
(i.e. the identity function / no transformation)
Advanced Options for Gene Aggregation:
gene_mode
: Set this to TRUE
to get the old counts-aggregation method
for doing gene-level analysis. This requires aggregation_column
to be set. If
TRUE
, this will override the p-value aggregation mode, but will allow for gene-centric
modeling, plotting, and results.
sleuth_fit
to fit a model, sleuth_wt
or
sleuth_lrt
to perform hypothesis testing
# NOT RUN {
# Assume we have run kallisto on a set of samples, and have two treatments,
genotype and drug.
colnames(s2c)
# [1] "sample" "genotype" "drug" "path"
so <- sleuth_prep(s2c, ~genotype + drug)
# }
Run the code above in your browser using DataLab