newSCESet: Create a new SCESet object.

Description

Scater requires that all data be housed in SCESet objects. SCESet extends Bioconductor's ExpressionSet class, and the same basic interface is supported. newSCESet() expects a matrix of expression values as its first argument, with rows as features (usually genes) and columns as cells. Per-feature and per-cell metadata can be supplied with the featureData and phenoData arguments, respectively. Use of these optional arguments is strongly encouraged. The SCESet also includes a slot 'counts' to store an object containing raw count data.

Usage

newSCESet(exprsData = NULL, countData = NULL, tpmData = NULL, fpkmData = NULL, cpmData = NULL, phenoData = NULL, featureData = NULL, experimentData = NULL, is_exprsData = NULL, lowerDetectionLimit = 0, logExprsOffset = 1, logged = FALSE, useForExprs = "exprs")

Arguments

exprsData

expression data matrix for an experiment

countData

data matrix containing raw count expression values

tpmData

matrix of class "numeric" containing transcripts-per-million (TPM) expression values

fpkmData

matrix of class "numeric" containing fragments per kilobase of exon per million reads mapped (FPKM) expression values

cpmData

matrix of class "numeric" containing counts per million (CPM) expression values (optional)

phenoData

data frame containing attributes of individual cells

featureData

data frame containing attributes of features (e.g. genes)

experimentData

MIAME class object containing metadata data and details about the experiment and dataset.

is_exprsData

matrix of class "logical", indicating whether or not each observation is above the lowerDetectionLimit.

lowerDetectionLimit

the minimum expression level that constitutes true expression (defaults to zero and uses count data to determine if an observation is expressed or not)

logExprsOffset

numeric scalar, providing the offset used when doing log2-transformations of expression data to avoid trying to take logs of zero. Default offset value is 1.

logged

logical, if a value is supplied for the exprsData argument, are the expression values already on the log2 scale, or not?

useForExprs

character string, either 'exprs' (default),'tpm','counts' or 'fpkm' indicating which expression representation both internal methods and external packages should use when performing analyses.

Value

a new SCESet object

Details

SCESet objects store a matrix of expression values. These values are typically transcripts-per-million (tpm), counts-per-million (cpm), fragments per kilobase per million mapped (FPKM) or some other output from a program that calculates expression values from RNA-Seq reads. We recommend that expression values on the log2 scale are used for the 'exprs' slot in the SCESet. For example, you may wish to store raw tpm values in the 'tpm' slot and log2(tpm + 1) values in the 'exprs' slot. However, expression values could also be values from a single cell qPCR run or some other type of assay. The newSCESet function can also accept raw count values. In this case see calculateTPM and calculateFPKM for computing TPM and FPKM expression values, respectively, from counts. The function cpm from the package edgeR to can be used to compute log2(counts-per-million), if desired.

An SCESet object has to have the 'exprs' slot defined, so if the exprsData argument is NULL, then this function will define 'exprs' with the following order of precedence: log2(TPM + logExprsOffset), if tpmData is defined; log2(FPKM + logExprsOffset) if fpkmData is defined; otherwise log2(counts-per-million + logExprsOffset) are used. The cpm function from the edgeR package is used to compte cpm. Note that for many analyses counts-per-million are not recommended, and if possible transcripts-per-million should be used.

In many downstream functions you will likely find it most convenient if the 'exprs' values are on the log2-scale, so this is recommended.

Examples

Run this code

data("sc_example_counts")
data("sc_example_cell_info")
pd <- new("AnnotatedDataFrame", data = sc_example_cell_info)
example_sceset <- newSCESet(countData = sc_example_counts, phenoData = pd)
example_sceset

Run the code above in your browser using DataLab