contamination.stats: Contamination filtering.

Description

Occasionally DNA or RNA libraries are contaminate each other. To address this issue and estimate contamination rate tcR offers contamination.stats and decontamination functions. The decontamination function received data (either data frame or a list with data frames) and a limit for clonal proportion as arguments. Script searches for a similar clones to the first data frame in the other (or performs pairwise searches if the given data is a list) and removes clones from the first data frame, which has been found in the second one with counts less or equal to 10 * counts of similar clones in the first one. Function contamination.stats will return the number of clones which will be removed with the contamination.stats function.

Usage

contamination.stats(.data1, .data2, .limit = 20, .col = 'Read.count')
decontamination(.data1, .data2, .limit = 20, .col = 'Read.count', .symm = T)

Arguments

.data1

First data frame with columns 'CDR3.nucleotide.sequence' and 'Read.count'. Will be checked for contamination.

.data2

Second data frame with such columns. Will be used for checking for sequences which contaminated the first one.

.limit

Parameter for filtering: all sequences from .data1 which are presented in .data2 and (count of in .data2) / (count of seq in .data1) >= .limit are removed.

.col

Column's name with clonal count.

.symm

if T then perform filtering out of sequences in .data1, and then from .data2. Else only from .data1.

Value

Filtered .data1 or a list with filtered both .data1 and .data2.