Compute site-specific Extended Haplotype Homozygosity (EHHS) and integrated EHHS (iES) for a given focal marker.
calc_ehhs(
haplohh,
mrk,
limhaplo = 2,
limhomohaplo = 2,
limehhs = 0.05,
include_zero_values = FALSE,
include_nhaplo = FALSE,
phased = TRUE,
scalegap = NA,
maxgap = NA,
discard_integration_at_border = TRUE,
lower_y_bound = limehhs,
interpolate = TRUE
)
an object of class haplohh
(see data2haplohh
).
integer representing the number of the focal marker within the haplohh object or string representing its ID/name.
if there are less than limhaplo
chromosomes that can be used for
the calculation of EHH, the calculation is stopped. The option is intended for the case of missing data,
which leads to the successive exclusion of haplotypes: the further away from the focal marker
the less haplotypes contribute to EHH.
if there are less than limhomohaplo
homozygous chromosomes, the
calculation is stopped. This option is intended for unphased data and should be invoked only
if relatively low frequency variants are not filtered subsequently (see main vignette and Klassmann et al. 2020).
limit at which EHHS stops to be evaluated.
logical. If FALSE
, return values only for those positions where the calculation is
actually performed, i.e. until stopped by reaching either limehh
or limhaplo
. If TRUE
, report EHH values for
all markers, the additional ones being zero.
logical. If TRUE
, report the number of evaluated haplotypes at each marker
(only informative, if missing data leads to a decrease of evaluated haplotypes).
logical. If TRUE
(default) chromosomes are expected to be phased. If FALSE
, the haplotype data is assumed to
consist of pairwise ordered chromosomes belonging to diploid individuals.
EHHS is then estimated over individuals which are homozygous at the focal marker.
scale or cap gaps larger than the specified size to the specified size (default=NA
, i.e. no scaling).
maximum allowed gap in bp between two markers. If exceeded, further calculation of EHHS is stopped at the gap
(default=NA
, i.e no limitation).
logical. If TRUE
(default) and computation reaches first or last marker or a gap larger than maxgap
,
iHH is set to NA
.
lower y boundary of the area to be integrated over (default: limehhs
). Can be set
to zero for compatibility with the program hapbin.
logical. Affects only IES and INES values. If TRUE
(default), integration
is performed over a continuous EHHS curve (values are interpolated linearly between consecutive markers),
otherwise the EHHS curve decreases stepwise at markers.
The returned value is a list containing the following elements:
The name/identifier of the focal marker.
A table containing EHHS values as used by Sabeti et al. (2007), resp. the same values normalized to 1 at the focal marker (nEHHS) as used by Tang et al. (2007).
Integrated EHHS.
Integrated normalized EHHS.
Values for site-specific Extended Haplotype Homozygosity (EHHS) are computed at each position upstream and downstream of the focal marker. These values are integrated with respect to their genomic position to yield an 'integrated EHHS' (iES) value.
Gautier, M. and Naves, M. (2011). Footprints of selection in the ancestral admixture of a New World Creole cattle breed. Molecular Ecology, 20, 3128-3143.
Klassmann, A. and Gautier, M. (2020). Detecting selection using Extended Haplotype Homozygosity-based statistics on unphased or unpolarized data (preprint). https://doi.org/10.22541/au.160405572.29972398/v1
Sabeti, P.C. et al. (2002). Detecting recent positive selection in the human genome from haplotype structure. Nature, 419, 832-837.
Sabeti, P.C. et al. (2007). Genome-wide detection and characterization of positive selection in human populations. Nature, 449, 913-918.
Tang, K. and Thornton, K.R. and Stoneking, M. (2007). A New Approach for Using Genome Scans to Detect Recent Positive Selection in the Human Genome. Plos Biology, 7, e171.
Voight, B.F. and Kudaravalli, S. and Wen, X. and Pritchard, J.K. (2006). A map of recent positive selection in the human genome. Plos Biology, 4, e72.
# NOT RUN {
#example haplohh object (280 haplotypes, 1424 SNPs)
#see ?haplohh_cgu_bta12 for details
data(haplohh_cgu_bta12)
#computing EHHS statistics for the marker "F1205400"
#which displays a strong signal of selection
ehhs <- calc_ehhs(haplohh_cgu_bta12, mrk = "F1205400")
# }
Run the code above in your browser using DataLab