- file
The path to a VCF file to be read. This can be uncompressed, bgzipped using
Samtools or Bioconductor, or a TabixFile
object from Bioconductor.
- phaseSNPs
If TRUE
, markers that appear to have come from the same set of reads
will be phased and grouped into haplotypes. Otherwise, each row of the file
will be kept as a distinct marker.
- tagsize
The read length, minus any barcode sequence, that was used for genotyping. In TASSEL,
this is the same as the kmerLength option. This argument is used for grouping
SNPs into haplotypes and is ignored if phaseSNPs = FALSE
.
- refgenome
Optional. The name of a FASTA file, or an FaFile
object, containing
the reference genome. When grouping SNPs into haplotypes, if provided this
reference genome is used to insert non-variable nucleotides between the variable
nucleotides in the alleleNucleotides
slot of the RADdata
output.
Ignored if phaseSNPs = FALSE
. Useful if exact SNP positions need to be
retained for downstream analysis after genotype calling in polyRAD.
In particular this argument is necessary if you plan to export genotype calls
back to VCF.
- tol
The proportion by which two SNPs can differ in read depth and still be merged
into one group for phasing. Ignored if phaseSNPs = FALSE
.
- al.depth.field
The name of the genotype field in the VCF file that contains read depth at
each allele. This should be "AD" unless your format is very unusual.
- min.ind.with.reads
Integer used for filtering SNPs. To be retained, a SNP must have at least
this many samples with reads.
- min.ind.with.minor.allele
Integer used for filtering SNPs. To be retained, a SNP must have at least
this many samples with the minor allele. When there are more than two
alleles, at least two alleles must have at least this many samples with
reads for the SNP to be retained.
- possiblePloidies
A list indicating inheritance modes that might be encountered in the
dataset. See RADdata
.
- taxaPloidy
A single integer, or an integer vector with one value per taxon, indicating
ploidy. See RADdata
.
- contamRate
A number indicating the expected sample cross-contamination rate. See
RADdata
.
- samples
A character vector containing the names of samples from the file to
export to the RADdata
object. The default is all samples.
If a subset is provided, filtering with min.ind.with.reads
and
min.ind.with.minor.allele
is performed within that subset. Ignored
if a different samples
argument is provided within svparam
.
- svparam
A ScanVcfParam
object to be
used with readVcf
. The primary
reasons to change this from the default would be 1) if you want additional
FIXED or INFO fields from the file to be exported to the locTable
slot of the RADdata
object, and/or 2) if you only want to import
particular regions of the genome, as specified with the which
argument of ScanVcfParam
.
- yieldSize
An integer indicating the number of lines of the file to read at once.
Increasing this number will make the function faster but consume more RAM.
- expectedAlleles
An integer indicating the approximate number of alleles that are expected
to be imported after filtering and phasing. If this number is too low,
the function may slow down considerably. Increasing this number
increases the amount of RAM used by the function.
- expectedLoci
An integer indicating the approximate number of loci that are expected
to be imported after filtering and phasing. If this number is too low,
the function may slow down considerably. Increasing this number
increases the amount of RAM used by the function.
- maxLoci
An integer indicating the approximate maximum number of loci to return. If
provided, the function will stop reading the file once it has found at least
this many loci that pass filtering and phasing. This argument is intended to
be used for generating small RADdata
objects for testing purposes, and
should be left NA
under normal circumstances.