calc_ehh: EHH and iHH computation for a given focal marker

Description

Compute Extended Haplotype Homozygosity (EHH) and integrated EHH (iHH) for a given focal marker.

Usage

calc_ehh(
  haplohh,
  mrk,
  limhaplo = 2,
  limhomohaplo = 2,
  limehh = 0.05,
  include_zero_values = FALSE,
  include_nhaplo = FALSE,
  phased = TRUE,
  polarized = TRUE,
  scalegap = NA,
  maxgap = NA,
  discard_integration_at_border = TRUE,
  lower_y_bound = limehh,
  interpolate = TRUE
)

Arguments

haplohh

an object of class haplohh (see data2haplohh).

mrk

integer representing the number of the focal marker within the haplohh object or string representing its ID/name.

limhaplo

if there are less than limhaplo chromosomes that can be used for the calculation of EHH, the calculation is stopped. The option is intended for the case of missing data, which leads to the successive exclusion of haplotypes: the further away from the focal marker the less haplotypes contribute to EHH.

limhomohaplo

if there are less than limhomohaplo homozygous chromosomes, the calculation is stopped. This option is intended for unphased data and should be invoked only if relatively low frequency variants are not filtered subsequently (see main vignette and Klassmann et al. 2020).

limehh

limit at which EHH stops to be evaluated

include_zero_values

logical. If FALSE, return values only for those positions where the calculation is actually performed, i.e. until stopped by reaching either limehh or limhaplo. If TRUE, report EHH values for all markers, the additional ones being zero.

include_nhaplo

logical. If TRUE, report the number of evaluated haplotypes at each marker (only informative, if missing data leads to a decrease of evaluated haplotypes).

phased

logical. If TRUE (default) chromosomes are expected to be phased. If FALSE, the haplotype data is assumed to consist of pairwise ordered chromosomes belonging to diploid individuals. EHH is then estimated over individuals which are homozygous at the focal marker.

polarized

logical. TRUE by default. If FALSE, use major and minor allele instead of ancestral and derived. If there are more than two alleles then the minor allele refers to the second-most frequent allele.

scalegap

scale or cap gaps larger than the specified size to the specified size (default=NA, i.e. no scaling).

maxgap

maximum allowed gap in bp between two markers. If exceeded, further calculation of EHH is stopped at the gap (default=NA, i.e no limitation).

discard_integration_at_border

logical. If TRUE (default) and computation reaches first or last marker or a gap larger than maxgap, iHH is set to NA.

lower_y_bound

lower y boundary of the area to be integrated over (default: limehh). Can be set to zero for compatibility with the program hapbin.

interpolate

logical. Affects only IHH values. If TRUE (default), integration is performed over a continuous EHH curve (values are interpolated linearly between consecutive markers), otherwise the EHH curve decreases stepwise at markers.

Value

The returned value is a list containing the following elements:

mrk.name: The name/identifier of the focal marker.
freq: A vector with the frequencies of the alleles of the focal marker.
ehh: A data frame with EHH values for each allele of the focal marker.
ihh: A vector with iHH (integrated EHH) values for each allele of the focal marker.

Details

Values for allele-specific Extended Haplotype Homozygosity (EHH) are computed upstream and downstream of the focal marker for each of its alleles. These values are integrated with respect to their genomic positions to yield an 'integrated EHH' (iHH) value for each allele.

References

Gautier, M. and Naves, M. (2011). Footprints of selection in the ancestral admixture of a New World Creole cattle breed. Molecular Ecology, 20, 3128-3143.

Klassmann, A. and Gautier, M. (2020). Detecting selection using Extended Haplotype Homozygosity-based statistics on unphased or unpolarized data (preprint). https://doi.org/10.22541/au.160405572.29972398/v1

Sabeti, P.C. et al. (2002). Detecting recent positive selection in the human genome from haplotype structure. Nature, 419, 832-837.

Sabeti, P.C. et al. (2007). Genome-wide detection and characterization of positive selection in human populations. Nature, 449, 913-918.

Tang, K. and Thornton, K.R. and Stoneking, M. (2007). A New Approach for Using Genome Scans to Detect Recent Positive Selection in the Human Genome. Plos Biology, 7, e171.

Voight, B.F. and Kudaravalli, S. and Wen, X. and Pritchard, J.K. (2006). A map of recent positive selection in the human genome. Plos Biology, 4, e72.

Examples

Run this code

# NOT RUN {
#example haplohh object (280 haplotypes, 1424 SNPs)
#see ?haplohh_cgu_bta12 for details
data(haplohh_cgu_bta12)
#computing EHH statistics for the marker "F1205400"
#which displays a strong signal of selection
ehh <- calc_ehh(haplohh_cgu_bta12, mrk = "F1205400")
# }

Run the code above in your browser using DataLab