In most cases, genes
will be a single vector or one-column matrix. However,
there are some cases where a row of seqAnnot
corresponds to two (or more) genes
(e.g. the V and J gene segments of a single immune sequence). Rather than make multiple
rows for each gene, the calcVDJcounts
function provides the option to provide
a multi-column matrix for genes
. The counts for each column will be tallied
separately, and are then concatenated.
To ensure equal variance across all repertoires, the default RDI metric uses
subsampling to ensure that all repertoires have the same number of sequences. The
default RDI metric subsamples all repertoires to the size of the smallest repertoire,
which may result in a loss of power for comparisons between larger repertoires.
In order to increase power for various tests, it is often useful to only calculate
the repertoire counts for a subset of the repertoires in seqAnnot. This can be done by
using the select
and combine
parameters to specify which
repertoires to include in the analysis.
Both parameters are lists containing entries
with the same name as one of the columns of seqAnnot. For select
, each entry is
a vector defining which values to include (e.g., to include only Visit 1 and 3, you
might specify select=list(visit=c("V1","V3"))
, where the 'visit'
column
in seqAnnot contains the values "V1"
,"V2"
, and "V3"
). In this
case, any rows of genes
and seqAnnot
that come from a repertoire not
specified in select
will be discarded. By default, if a select
code is
not specified for a column in seqAnnot
, all values from that column will be
included.
The combine
parameter works in a similar fashion, but instead of a vector
describing which parameters to include, you can specify a vector of regular
expressions, and any values of the seqAnnot
column that match the regular
expression will be combined into a single repertoire (e.g. to combine visits 1 and 3
into a single repertoire, you might specify combine=list(visit="V[13]")
).
The vdjDrop
parameter is also useful for limiting sequences. Like
select
and combine
, this is a named list, with entries corresponding to
the columns of genes
. Each entry of vdjDrop
is a vector of gene segment
names to remove from the analysis. All sequences containing those genes are removed
from the analysis before subsampling.
Once unwanted rows have been removed, the columns of seqAnnot
are concatenated
to generate "repertoire" labels for each row. The repertoire labels are then used
to split the rows of genes
, and gene prevalence is tallied within a repertoire.
By default, columns of seqAnnot
that are constant after subsetting will not be
included in the label. However, this can be controlled by the simplifyNames
parameter. If simplifyNames
is FALSE, all columns of seqAnnot
are
included when generating labels.