Learn R Programming

polyRAD (version 1.1)

SubsetByLocus: Create RADdata Objects with a Subset of Loci

Description

These functions take a RADdata object as input and generate smaller RADdata objects containing only the specified loci. SubsetByLocus allows the user to specify which loci are kept, whereas SplitByChromosome creates multiple RADdata objects representing chromosomes or sets of chromosomes. RemoveMonomorphicLoci eliminates any loci with fewer than two alleles. RemoveHighDepthLoci eliminates loci that have especially high read depth in order to eliminate false loci originating from repetitive sequence. RemoveUngenotypedLoci is intended for datasets that have been run through PipelineMapping2Parents and may have some genotypes that are missing or non-variable due to how priors were determined.

Usage

SubsetByLocus(object, ...)
# S3 method for RADdata
SubsetByLocus(object, loci, …)

SplitByChromosome(object, ...) # S3 method for RADdata SplitByChromosome(object, chromlist = NULL, chromlist.use.regex = FALSE, fileprefix = "splitRADdata", …) RemoveMonomorphicLoci(object, ...) # S3 method for RADdata RemoveMonomorphicLoci(object, verbose = TRUE, …)

RemoveHighDepthLoci(object, ...) # S3 method for RADdata RemoveHighDepthLoci(object, max.SD.above.mean = 2, verbose = TRUE, …)

RemoveUngenotypedLoci(object, ...) # S3 method for RADdata RemoveUngenotypedLoci(object, removeNonvariant = TRUE, …)

Arguments

object

A RADdata object.

loci

A character or numeric vector indicating which loci to include in the output RADdata object. If numeric, it refers to row numbers in object$locTable. If character, it refers to row names in object$locTable.

chromlist

An optional list indicating how chromosomes should be split into separate RADdata objects. Each item in the list is a vector of the same class as object$locTable$Chr (character or numeric) containing the names of chromosomes that should go into one group. If not provided, each chromosome will be sent to a separate RADdata object.

chromlist.use.regex

If TRUE, the character strings in chromlist will be treated as regular expressions for searching chromosome names. For example, if one wanted all chromosomes beginning with the string "scaffold" to go into one RADdata object, one could include the string "^scaffold" as an item in chromlist and set chromlist.use.regex = TRUE. If FALSE, exact matches to chromosome names will be used.

fileprefix

A character string indicating the prefix of .RData files to export.

max.SD.above.mean

The maximum number of standard deviations above the mean read depth that a locus can be in order to be retained.

verbose

If TRUE, print out information about the original number of loci and the number of loci that were retained. For RemoveHighDepthLoci, a histogram is also plotted showing mean depth per locus, and the cutoff for removing loci.

removeNonvariant

If TRUE, in addition to removing loci where posterior probabilities are missing, loci will be removed where posterior probabilities are uniform across the population.

Additional arguments (none implemented).

Value

SubsetByLocus, RemoveMonomorphicLoci, RemoveHighDepthLoci, and RemoveUngenotypedLoci return a RADdata object with all the slots and attributes of object, but only containing the loci listed in loci, only loci with two or more alleles, only loci without abnormally high depth, or only loci where posterior probabilities are non-missing and variable, respectively.

SplitByChromosome returns a character vector containing file names where .RData files have been saved. Each .RData file contains one RADdata object named splitRADdata.

Details

SubsetByLocus may be useful if the user has used their own filtering criteria to determine a set of loci to retain, and wants to create a new dataset with only those loci. It can be used at any point in the analysis process.

SplitByChromosome is intended to make large datasets more manageable by breaking them into smaller datasets that can be processed independently, either in parallel computing jobs on a cluster, or one after another on a computer with limited RAM. Generally it should be used immediately after data import. Rather than returning new RADdata objects, it saves them individually to separate workspace image files, which can than be loaded one at a time to run analysis pipelines such as IteratePopStruct. GetWeightedMeanGenotypes or one of the export functions can be run on each resulting RADdata object, and the resulting matrices concatenated with cbind.

SplitByChromosome, RemoveMonomorphicLoci, and RemoveHighDepthLoci use SubsetByLocus internally.

See Also

VCF2RADdata, SubsetByTaxon

Examples

Run this code
# NOT RUN {
# load a dataset for this example
data(exampleRAD)
exampleRAD

# just keep the first and fourth locus
subsetRAD <- SubsetByLocus(exampleRAD, c(1, 4))
subsetRAD

# split by groups of chromosomes
exampleRAD$locTable
tf <- tempfile()
splitfiles <- SplitByChromosome(exampleRAD, list(c(1, 4), c(6, 9)),
                                fileprefix = tf)
load(splitfiles[1])
splitRADdata

# filter out monomorphic loci (none removed in example)
filterRAD <- RemoveMonomorphicLoci(exampleRAD)

# filter out high depth loci (none removed in this example)
filterRAD2 <- RemoveHighDepthLoci(filterRAD)

# filter out loci with missing or non-variable genotypes 
# (none removed in this example)
filterRAD3 <- IterateHWE(filterRAD2)
filterRAD3 <- RemoveUngenotypedLoci(filterRAD3)
# }

Run the code above in your browser using DataLab