shared.repertoire: Shared TCR repertoire managing and analysis

Description

Generate a repertoire of shared sequences - sequences presented in more than one subject. If sequence is appeared more than once in the one repertoire, than only the first appeared one will be choosed for a shared repertoire.

shared.repertoire - make a shared repertoire of sequences from the given list of data frames.

shared.matrix - leave columns, which related to the count of sequences in people, and return them as a matrix. I.e., this functions will remove such columns as 'CDR3.amino.acid.sequence', 'V.gene', 'People'.

Usage

shared.repertoire(.datalist, .type = 'avrc', .min.ppl = 1, .head = -1,
                  .clear = T, .verbose = T, .by.col = '', .sum.col = '',
                  .max.ppl = length(.datalist))
shared.matrix(.shared.rep)

Arguments

.datalist

List with data frames.

.type

String of length 4 denotes how to create a shared repertoire. See "Details" for more information. If supplied, than parameters .by.col and .sum.col will be ignored. If not supplied, than columns in .by.col and .sum.col will be used.

.min.ppl

At least how many people must have a sequence to leave this sequence in the shared repertoire.

.head

Parameter for the head function, applied to all data frames before clearing.

.clear

if T then remove all sequences which have symbols "~" or "*" (i.e., out-of-frame sequences for amino acid sequences).

.verbose

if T then output progress.

.by.col

Character vector with names of columns with sequences and their parameters (like segment) for using for creating a shared repertoire.

.sum.col

Character vector of length 1 with names of the column with count, percentage or any other numeric chaaracteristic of sequences for using for creating a shared repertoire.

.max.ppl

At most how many people must have a sequence to leave this sequence in the shared repertoire.

.shared.rep

Shared repertoire.

Value

Data frame for shared.repertoire, matrix for shared.matrix.

Details

Parameter .type is a string of length 4, where:

First character stands either for the letter 'a' for taking the "CDR3.amino.acid.sequence" column or for the letter 'n' for taking the "CDR3.nucleotide.sequence" column.
Second character stands whether or not take the V.gene column. Possible values are '0' (zero) stands for taking no additional columns, 'v' stands for taking the "V.gene" column.
Third character stands for using either UMIs or reads in choosing the column with numeric characterisitc (see the next letter).
Fourth character stands for name of the column to choose as numeric characteristic of sequences. It depends on the third letter. Possible values are "c" for the "Umi.count" (if 3rd character is "u") / "Read.count" column (if 3rd character is "r"), "p" for the "Umi.proportion" / "Read.proportion" column, "r" for the "Rank" column or "i" for the "Index" column. If "Rank" or "Index" isn't in the given repertoire, than it will be created using set.rank function using "Umi.count" / "Read.count" column.

Examples

Run this code

# NOT RUN {
# Set "Rank" column in data by "Read.count" column.
# This is doing automatically in shared.repertoire() function
# if the "Rank" column hasn't been found.
immdata <- set.rank(immdata)
# Generate shared repertoire using "CDR3.amino.acid.sequence" and
# "V.gene" columns and with rank.
imm.shared.av <- shared.repertoire(immdata, 'avrc')
# }