Learn R Programming

GenoPop (version 1.0.0)

PrivateAlleles: PrivateAlleles

Description

This function calculates the number of private alleles in two populations from a VCF file. (Alleles which are not present in the other popualtion.) It processes the file in batches or specified windows across the genome. For batch processing, it uses process_vcf_in_batches. For windowed analysis, it uses a similar approach tailored to process specific genomic windows (process_vcf_in_windows).

Usage

PrivateAlleles(
  vcf_path,
  pop1_individuals,
  pop2_individuals,
  threads = 1,
  write_log = FALSE,
  logfile = "log.txt",
  batch_size = 10000,
  window_size = NULL,
  skip_size = NULL
)

Value

In batch mode (no window_size or skip_size provided): A list containing the number of private alleles for each population. In window mode (window_size and skip_size provided): A list of data frames, each with columns 'Chromosome', 'Start', 'End', 'PrivateAllelesPop1', and 'PrivateAllelesPop2', representing the count of private alleles within each window for each population.

Arguments

vcf_path

Path to the VCF file.

pop1_individuals

Vector of individual names belonging to the first population.

pop2_individuals

Vector of individual names belonging to the second population.

threads

Number of threads to use for parallel processing.

write_log

Logical, indicating whether to write progress logs.

logfile

Path to the log file where progress will be logged.

batch_size

The number of variants to be processed in each batch (used in batch mode only, default of 10,000 should be suitable for most use cases).

window_size

Size of the window for windowed analysis in base pairs (optional). When specified, skip_size must also be provided.

skip_size

Number of base pairs to skip between windows (optional). Used in conjunction with window_size for windowed analysis.

Examples

Run this code
# Batch mode example
vcf_file <- system.file("tests/testthat/sim.vcf.gz", package = "GenoPop")
index_file <- system.file("tests/testthat/sim.vcf.gz.tbi", package = "GenoPop")
pop1_individuals <- c("tsk_0", "tsk_1", "tsk_2")
pop2_individuals <- c("tsk_3", "tsk_4", "tsk_5")
private_alleles <- PrivateAlleles(vcf_file, pop1_individuals, pop2_individuals)

# Window mode example
private_alleles_windows <- PrivateAlleles(vcf_file, pop1_individuals, pop2_individuals,
                                          window_size = 100000, skip_size = 50000)

Run the code above in your browser using DataLab