Occasionally DNA or RNA libraries are contaminate each other. To address this issue and estimate contamination rate tcR
offers
contamination.stats
and decontamination
functions. The decontamination
function received data
(either data frame or a list with data frames) and a limit for clonal proportion as arguments.
Script searches for a similar clones to the first data frame in the other (or performs pairwise searches if the given data is a list)
and removes clones from the first data frame, which has been found in the second one with counts less or equal to 10 * counts of similar clones
in the first one. Function contamination.stats
will return the number of clones which will be removed with the contamination.stats
function.
contamination.stats(.data1, .data2, .limit = 20, .col = 'Read.count')decontamination(.data1, .data2, .limit = 20, .col = 'Read.count', .symm = T)
First data frame with columns 'CDR3.nucleotide.sequence' and 'Read.count'. Will be checked for contamination.
Second data frame with such columns. Will be used for checking for sequences which contaminated the first one.
Parameter for filtering: all sequences from .data1
which are presented in .data2
and (count of in .data2
) / (count of seq in .data1
) >= .limit
are removed.
Column's name with clonal count.
if T then perform filtering out of sequences in .data1, and then from .data2. Else only from .data1.
Filtered .data1
or a list with filtered both .data1
and .data2
.