Learn R Programming

poolHelper

Simulates Pooled Sequencing Genetic Data

This is the source code for the poolHelper R package. The main goal of this package is to provide tools to help researchers design their pooled sequencing studies.

The package provides functions to simulate pooled sequencing (Pool-seq) data under a variety of conditions. Users can define the average coverage, the number of individuals in the pool, the number of pools used and the Pool-seq error. The poolHelper package simulates the allele frequencies obtained with Pool-seq under different combinations of those parameters. These allele frequencies are then compared with the allele frequencies computed directly from the genotypes in the sample and the average absolute difference between both sets of allele frequencies is calculated.

Copy Link

Version

Install

install.packages('poolHelper')

Monthly Downloads

191

Version

1.1.0

License

GPL (>= 3)

Issues

Pull Requests

Stars

Forks

Maintainer

João Carvalho

Last Published

June 29th, 2023

Functions in poolHelper (1.1.0)

strg2sync

Create sync string for a single SNP
forcePool

Randomly select the required number of loci from the pooled sequencing data
calculatePi

Calculate population frequency at each SNP
filterPool

Filter Pool-seq data according to a minor-allele reads threshold
filterMinor

Filter sites according to a minor-allele reads threshold
pool2sync

Create 'synchronized' file from Pool-seq data
numberReferencePop

Compute the number of reference reads for multiple populations
findMinor

Define major and minor alleles
GetGenotypes

Create genotypes from a output with haplotypes
pool2vcf

Create VCF file from Pool-seq data
popReads

Compute number of reads for each individual and across all sites
Ifreqs

Compute allele frequencies from genotypes
indProbs

Probability of contribution of each individual
maePool

Average absolute difference between allele frequencies
poolProbs

Probability of contribution of each pool
poolPops

Create Pooled DNA sequencing data for multiple populations
simPoolseq

Simulate Pool-seq data
hap2geno

Convert haplotypes to genotypes
getNumReadsR_vector

Compute the number of reference reads
popsReads

Simulate total number of reads for multiple populations
maeHet

Average absolute difference between the expected heterozygosity computed from genotypes and from Pool-seq data
vcflocus

Create vcf string for all SNPs in a single locus
mymae

Average absolute difference between allele frequencies computed from genotypes supplied by the user and from Pool-seq data
removeSites

Apply a minor allele reads threshold
vcfinfo

Create vcf table with relevant information
remove_by_reads

Apply a coverage-based filter over a list
simulateCoverage

Simulate total number of reads per site
remove_by_reads_matrix

Apply a coverage-based filter to a matrix
simReads

Simulate coverage at a single locus
numberReference

Compute the number of reference reads at multiple loci
strg2vcf

Create vcf string for a single SNP
splitMatrix

Split matrix of genotypes
run_scrm

Simulate a single population
poolReads

Reads contributed by each pool
vcfloci

Create vcf string for all SNPs in multiple loci
ExpHet_site

Compute expected heterozygosity per site
computeReference

Compute the number of reference reads over a matrix
Pfreqs

Compute allele frequencies from pooled sequencing data
Expected_Het

Compute expected heterozygosity within a population
haplo.fix

Create invariable sites
errorHet

Average absolute difference between expected heterozygosity
maeFreqs

Average absolute difference between allele frequencies computed from genotypes and from Pool-seq data
indReads

Reads contributed by each individual