The readVCF function expects a tabixed VCF file with a diploid GT field.
In case of haploid data, the GT field has to be transformed to a pseudo-diploid
field (such as 0 -> 0|0). An alternative is to use readData(..., format="VCF"),
which can read non-tabixed haploid and any kind of polyploid VCFs directly.
When approx=TRUE
, the algorithm will apply a logical OR to the GT-field:
(0|0=0,1|0=1,0|1=1,1|1=1). Note, this is an approximation for diploid data, which will
speed up calculations. In case of haploid data, approx
should be switched to TRUE
.
If approx=FALSE
, the full diploid information will be considered.
The ff-package PopGenome uses to store the SNP information limits total data size to
individuals * (number of SNPs) <= .Machine$integer.max
In case of very large data sets, the bigmemory package will be used;
this will slow down calculations (e.g. this package have to be installed first !!!).
Use the function vcf_handle <-.Call("VCF_open", filename)
to open a VCF-file and .Call("VCF_getSampleNames",vcf_handle)
to get and define the individuals which should be considered in the analysis.
See also readData(..., format="VCF") !