RADdata: RADdata object constructor

Description

RADdata is used internally to generate objects of the S3 class “RADdata” by polyRAD functions for importing read depth data. It is also available at the user level for cases where the data for import are not already in a format supported by polyRAD.

Usage

RADdata(alleleDepth, alleles2loc, locTable, possiblePloidies, contamRate,
        alleleNucleotides)
        
# S3 method for RADdata
plot(x, ...)

Value

An object of the S3 class “RADdata”. The following slots are available using the $ operator:

alleleDepth: Identical to the argument provided to the function.
alleles2loc: Identical to the argument provided to the function.
locTable: Identical to the argument provided to the function.
possiblePloidies: The possiblePloidies argument, converted to integer.
locDepth: A matrix with taxa in rows and loci in columns, with read depth summed across all alleles for each locus. Column names are locus numbers rather than locus names. See GetLocDepth for retrieving the same matrix but with locus names as column names.
depthSamplingPermutations: A numeric matrix with taxa in rows and alleles in columns. It is calculated as $log(locDepth choose alleleDepth)$. This is used as a coefficient for likelihood estimations done by other polyRAD functions (i.e. AddGenotypeLikelihood).
depthRatio: A numeric matrix with taxa in rows and alleles in columns. Calculated as $alleleDepth / locDepth$. Used by other polyRAD functions for rough estimation of genotypes and allele frequency.
antiAlleleDepth: An integer matrix with taxa in rows and alleles in columns. For each allele, the number of reads from the locus that do NOT belong to that allele. Calculated as $locDepth - alleleDepth$. Used for likelihood estimations by other polyRAD functions.
alleleNucleotides: Identical to the argument provided to the function.

The object additionally has several attributes (see attr):

taxa: A character vector listing all taxa names, in the same order as the rows of alleleDepth.
nTaxa: An integer indicating the number of taxa.
nLoc: An integer indicating the number of loci in locTable.
contamRate: Identical to the argument provided to the function.

The plot method performs a principal components analysis with AddPCA if not already done, then plots the first two axes. Points represent individuals (taxa). If mapping population parents have been noted in the object (see SetDonorParent), they are indicated in the plot.

Arguments

alleleDepth: An integer matrix, with taxa in rows and alleles in columns. Taxa names should be included as row names. Each value indicates the number of reads for a given allele in a given taxon. There should be no NA values; use zero to indicate no reads.
alleles2loc: An integer vector with one value for each column of alleleDepth. The number indicates the identity of the locus to which the allele belongs. A locus can have any number of alleles assigned to it (including zero).
locTable: A data frame, where locus names are row names. There must be at least as many rows as the highest value of alleles2loc; each number in alleles2loc corresponds to a row index in locTable. No columns are required, although if provided a column named “Chr” will be used for indicating chromosome identities, a column named “Pos” will be used for indicating physical position, and a column named “Ref” will be used to indicate the reference sequence.
possiblePloidies: A list, where each item in the list is an integer vector (or a numeric vector that can be converted to integer). Each vector indicates an inheritance pattern that SNPs in the dataset might obey. 2 indicates diploid, 4 indicates autotetraploid, c(2, 2) indicates allotetraploid, etc.
contamRate: A number ranging from zero to one (although in practice probably less than 0.01) indicating the expected sample cross-contamination rate.
alleleNucleotides: A character vector with one value for each column of alleleDepth, indicating the DNA sequence for that allele. Typically only the sequence at variable sites is provided, although intervening non-variable sequence can also be provided.
x: A “RADdata” object.
...: Additional arguments to pass to plot, for example col or pch.

Author

Lindsay V. Clark

Details

For a single locus, ideally the string provided in locTable$Ref and all strings in alleleNucleotides are the same length, so that SNPs and indels may be matched by position. The character “-” indicates a deletion with respect to the reference, and can be used within alleleNucleotides. The character “.” is a placeholder where other alleles have an insertion with respect to the reference, and may be used in locTable$Ref and alleleNucleotides. Note that it is possible for the sequence in locTable$Ref to be absent from alleleNucleotides if the reference haplotype is absent from the dataset, as may occur if the reference genome is that of a related species and not the actual study species. For the alleleNucleotides vector, the attribute "Variable_sites_only" indicates whether non-variable sequence in between variants is included; this needs to be FALSE for other functions to determine the position of each variant within the set of tags.

Examples

Run this code

# create the dataset
mydepth <- matrix(sample(100, 16), nrow = 4, ncol = 4,
                  dimnames = list(paste("taxon", 1:4, sep = ""),
                  paste("loc", c(1,1,2,2), "_", c(0,1,0,1), sep = "")))
mydata <- RADdata(mydepth, c(1L,1L,2L,2L), 
                  data.frame(row.names = c("loc1", "loc2"), Chr = c(1,1),
                             Pos = c(2000456, 5479880)),
                  list(2, c(2,2)), 0.001, c("A", "G", "G", "T"))

# inspect the dataset
mydata
mydata$alleleDepth
mydata$locDepth
mydata$depthRatio

# the S3 class structure is flexible; other data can be added
mydata$GPS <- data.frame(row.names = attr(mydata, "taxa"),
                         Lat = c(43.12, 43.40, 43.05, 43.27),
                         Long = -c(70.85, 70.77, 70.91, 70.95))
mydata$GPS

# If you have NA in your alleleDepth matrix to indicate zero reads,
# perform the following before running the RADdata constructor:
mydepth[is.na(mydepth)] <- 0L

# plotting a RADdata object
plot(mydata)

Run the code above in your browser using DataLab