Learn R Programming

GenoPop (version 1.0.0)

OneDimSFS: OneDimSFS

Description

This function calculates a one-dimensional site frequency spectrum from a VCF file. It processes the file in batches for efficient memory usage. The user can decide between a folded or unfolded spectrum.

Usage

OneDimSFS(
  vcf_path,
  folded = FALSE,
  batch_size = 10000,
  threads = 1,
  write_log = FALSE,
  logfile = "log.txt",
  exclude_ind = NULL
)

Value

Site frequency spectrum as a named vector

Arguments

vcf_path

Path to the VCF file.

folded

Logical, deciding if folded (TRUE) or unfolded (FALSE) SFS is returned.

batch_size

The number of variants to be processed in each batch (default of 10,000 should be suitable for most use cases).

threads

Number of threads to use for parallel processing.

write_log

Logical, indicating whether to write progress logs.

logfile

Path to the log file where progress will be logged.

exclude_ind

Optional vector of individual IDs to exclude from the analysis. If provided, the function will remove these individuals from the genotype matrix before applying the custom function. Default is NULL, meaning no individuals are excluded.

Examples

Run this code
vcf_file <- system.file("tests/testthat/sim.vcf.gz", package = "GenoPop")
index_file <- system.file("tests/testthat/sim.vcf.gz.tbi", package = "GenoPop")
sfs <- OneDimSFS(vcf_file, folded = FALSE)

Run the code above in your browser using DataLab