Learn R Programming

poolHelper (version 1.1.0)

vcfinfo: Create vcf table with relevant information

Description

Creates a data frame in the VCF format for all SNPs and across all loci in the data set.

Usage

vcfinfo(string, pos = NULL)

Value

a data frame with 10 different columns

chr

Chromosome. Each locus is treated as different linkage group.

pos

Co-ordinate. The coordinate of the SNP.

ID

Identifier.

REF

Reference allele. We assume that the reference allele is always an A. Note that this is not necessarily the major allele.

ALT

Alternative allele. We assume that the alternative allele is always a T.

QUAL

Quality score out of 100. We assume that this score is always 100.

FILTER

If this SNP passed quality filters.

INFO

Further information. Provides further information on the variants.

FORMAT

Information about the following columns. This column tells us how the number of reads is coded in the next column.

pop1

Number of reference-allele reads, alternative-allele reads and total depth of coverage observed for this population at this SNP.

Arguments

string

is a character vector or a list where each entry contains a character vector for a different locus. Each entry of this character vector contains the information for a single SNP coded as R,A:DP. The output of the vcflocus or vcfloci is the intended input here.

pos

is an optional input (default is NULL). If the actual position of the SNPs are known, they can be used as input here. When working with a single locus, this should be a numeric vector with each entry corresponding to the position of each SNP. If the data has multiple loci, this should be a list where each entry is a numeric vector with the position of the SNPs for a different locus.

Details

This function combines the information coded as R,A:DP with other necessary information such as the chromosome of each SNP, the position of the SNP and the quality of the genotype among others. Note that in the character string, R is the number of reads of the reference allele, A is the number of reads of the alternative allele and DP is the total depth of coverage. Each row of the data frame corresponds to a different SNP.