Learn R Programming

⚠️There's a newer version (2.1.5) of this package.Take me there.

updog

Updog provides a suite of methods for genotyping polyploids from next-generation sequencing (NGS) data. It does this while accounting for many common features of NGS data: allele bias, overdispersion, sequencing error, and (possibly) outlying observations. It is named updog for “Using Parental Data for Offspring Genotyping” because we originally developed the method for full-sib populations, but it works now for more general populations. The method is described in detail Gerard et. al. (2018) <doi:10.1534/genetics.118.301468>. Additional details concerning prior specification are described in Gerard and Ferrão (2019) <doi:10.1093/bioinformatics/btz852>.

The main function is flexdog(), which provides many options for the distribution of the genotypes in your sample. Novel genotype distributions include the class of proportional normal distributions (model = "norm") and the class of discrete unimodal distributions (model = "ash"). The default is model = "norm" because it is the most robust to varying genotype distributions, but feel free to use more specialized priors if you have more information on the data.

multidog() is a convenience function that let’s you run flexdog() over many SNP’s. It has support for parallel computing.

Also provided are:

  • An experimental function mupdog(), which allows for correlation between the individuals’ genotypes while jointly estimating the genotypes of the individuals at all provided SNPs. The implementation uses a variational approximation. This is designed for samples where the individuals share a complex relatedness structure (e.g. siblings, cousins, uncles, half-siblings, etc). Right now there are no guarantees about this function’s performance.
  • Functions to simulate genotypes (rgeno()) and read-counts (rflexdog()). These support all of the models available in flexdog().
  • Functions to evaluate oracle genotyping performance: oracle_joint(), oracle_mis(), oracle_mis_vec(), and oracle_cor(). We mean “oracle” in the sense that we assume that the entire data generation process is known (i.e. the genotype distribution, sequencing error rate, allele bias, and overdispersion are all known). These are good approximations when there are a lot of individuals (but not necessarily large read-depth).

The original updog package is now named updogAlpha and may be found here.

See also ebg, fitPoly, and TET, and polyRAD. Our best “competitor” is probably fitPoly, though polyRAD has some nice ideas for utilizing population structure and linkage disequilibrium.

See NEWS for the latest updates on the package.

Vignettes

I’ve included many vignettes in updog, which you can access online here.

Bug Reports

If you find a bug or want an enhancement, please submit an issue here.

Installation

You can install updog from CRAN in the usual way:

install.packages("updog")

You can install the current (unstable) version of updog from GitHub with:

# install.packages("devtools")
devtools::install_github("dcgerard/updog")

How to Cite

Please cite

Gerard, D., Ferrão, L. F. V., Garcia, A. A. F., & Stephens, M. (2018). Genotyping Polyploids from Messy Sequencing Data. Genetics, 210(3), 789-807. doi: 10.1534/genetics.118.301468.

Or, using BibTex:

@article {gerard2018genotyping,
    author = {Gerard, David and Ferr{\~a}o, Lu{\'i}s Felipe Ventorim and Garcia, Antonio Augusto Franco and Stephens, Matthew},
    title = {Genotyping Polyploids from Messy Sequencing Data},
    volume = {210},
    number = {3},
    pages = {789--807},
    year = {2018},
    doi = {10.1534/genetics.118.301468},
    publisher = {Genetics},
    issn = {0016-6731},
    URL = {https://doi.org/10.1534/genetics.118.301468},
    journal = {Genetics}
}

If you are using the proportional normal prior class (model = "norm") or the unimodal prior class (model = "ash"), then please also cite

Gerard, D. & Ferrão L. F. V. (2019). “Priors for Genotyping Polyploids.” Bioinformatics (in press). doi: 10.1093/bioinformatics/btz852

Or, using BibTex:

@article{gerard2019priors,
    author = {Gerard, David and Ferr{\~a}o, Lu{\'i}s Felipe Ventorim},
    title = {Priors for Genotyping Polyploids},
    journal = {Bioinformatics},
    year = {2019},
    month = {11},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btz852},
    note = {btz852},
}

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Copy Link

Version

Install

install.packages('updog')

Monthly Downloads

415

Version

1.2.0

License

GPL-3

Maintainer

David Gerard

Last Published

January 28th, 2020

Functions in updog (1.2.0)

compute_all_phifk

Computes $$\Phi^{-1}(F(k|K,\alpha_j,\rho_i))$$ for all possible (i,j,k).
dbetabinom

The Beta-Binomial Distribution
compute_all_post_prob

Computes every posterior probability for each dosage level for each individual at each SNP.
dbernbinom

Special case of betabinomial where the beta is bernoulli mu.
dbetabinom_double

The density function of the beta-binomial distribution.
compute_all_log_bb

Calculates the log-density for every individual by snp by dosage level.
convolve_up

Convolution between two discrete probability mass functions with support on 0:K.
dbetabinom_alpha_beta_double

Density function of betabinomial with the shape parameterizations
ashpen_fun

dpen_dh

Derivative of $$-log(h) - (log(h) - \mu_h)^2 / (2\sigma_h^2)$$ with respect to \(h\).
doutdist

The outlier distribution we use. Right now it is just a beta binomial with mean 1/2 and od 1/3 (so underlying beta is just a uniform from 0 to 1).
df_deps

Derivative of f with respect to eps.
dr_pen

dlbeta_dc

Derivative of the log-beta density with respect to c where \(c = (1 - \tau)/\tau\) where \(\tau\) is the overdispersion parameter.
dxi_dh

Derivative of xi-function with respect to bias parameter.
dxi_df

Derivative of xi with respect to f.
eta_fun

Adjusts allele dosage p by the sequencing error rate eps.
dlbeta_dtau

Derivative of the log-beta-binomial density with respect to the overdispersion parameter.
get_bivalent_probs

Returns segregation probabilities, pairing representation and number of ref alleles given the ploidy.
format_multidog

dlbeta_dxi

Derivative of the log-betabinomial density with respect to the mean of the underlying beta.
expit

The expit (logistic) function.
flexdog_obj_out

Log-likelihood that flexdog maximizes when outliers are present.
flexdog_obj

flexdog_full

Flexible genotyping for polyploids from next-generation sequencing data.
flexdog

Flexible genotyping for polyploids from next-generation sequencing data.
dpen_deps

Derivative of $$-log(\epsilon(1 - \epsilon)) - (logit(\epsilon) - \mu_{\epsilon})^2 / (2\sigma_{\epsilon}^2)$$ with respect to \(\epsilon\).
get_wik_mat

grad_for_weighted_lnorm

eta_double

Adjusts allele dosage p by the sequencing error rate eps.
get_inner_weights

Compute inner weights for updating the mixing proportions when using ash model.
elbo

The evidence lower bound
log_sum_exp_2

Log-sum-exponential trick using just two doubles.
initialize_pivec

is.flexdog

Tests if an argument is a flexdog object.
is.multidog

Tests if an argument is a multidog object.
get_wik_mat_out

E-step in flexdog where we now allow an outlier distribution.
obj_for_mu_sigma2_wrapper

Wrapper for obj_for_mu_sigma2 so that I can use it in optim.
obj_for_rho

Objective function when updating a single inbreeding coefficient.
dc_dtau

Derivative of \(c = (1 - \tau) / \tau\) with respect to \(\tau\).
dlbeta_deps

Derivative of the log-beta-binomial density with respect to the sequencing error rate.
dlbeta_dh

Derivative of log-betabinomial density with respect to bias parameter.
get_probk_vec

Obtain the genotype distribution given the distribution of discrete uniforms.
f1_obj

Objective for mixture of known dist and uniform dist.
flex_update_pivec

Update the distribution of genotypes from various models.
obj_for_eps

Objective function for updating sequencing error rate, bias, and overdispersion parameters.
pen_seq_error

Penalty on sequencing error rate.
obj_for_mu_sigma2

Objective function when updating mu and sigma2.
oracle_mis_vec_from_joint

Get the oracle misclassification error rates (conditional on true genotype) directly from the joint distribution of the genotype and the oracle estimator.
grad_for_mu_sigma2_wrapper

Gradient for obj_for_mu_sigma2_wrapper with respect for muSigma2 and a wrapper for grad_for_mu_sigma2
logit

The logit function.
pivec_from_segmats

Function to get the segregation probabilities from the distributions of each component and the weights of each component.
grad_for_weighted_lbb

obj_for_alpha

Objective function when updating alpha
mupout

get_bivalent_probs_dr

get_q_array

Return the probabilities of an offspring's genotype given its parental genotypes for all possible combinations of parental and offspring genotypes. This is for species with polysomal inheritance and bivalent, non-preferential pairing.
get_conv_inner_weights

Get the inner weights used for the em update in update_pp_f1 when there are more than two bivalent components for one of the parents.
get_uni_rep

Get the representation of a discrete unimodal probability distribution.
oracle_joint

The joint probability of the genotype and the genotype estimate of an oracle estimator.
multidog

Fit flexdog to multiple SNP's.
mupdog

Using correlation between individuals for genotyping.
oracle_cor

Calculates the correlation between the true genotype and an oracle estimator.
get_dimname

Returns a vector character strings that are all of the possible combinations of the reference allele and the non-reference allele.
summary.mupdog

oracle_cor_from_joint

Calculate the correlation of the oracle estimator with the true genotype from the joint distribution matrix.
get_hyper_weights

Return mixture weights needed to obtain a hypergeometric distribution.
grad_for_eps

snpdat

GBS data from Shirasawa et al (2017)
oracle_plot

oracle_mis

Calculate oracle misclassification error rate.
oracle_mis_from_joint

Get the oracle misclassification error rate directly from the joint distribution of the genotype and the oracle estimator.
plot.flexdog

plot.multidog

plot.mupdog

uitdewilligen

Subset of individuals and SNPs from Uitdewilligen et al (2013).
oracle_mis_vec

Returns the oracle misclassification rates for each genotype.
post_prob

Variational posterior probability of having dosage A alleles when the ploidy is ploidy, the allele frequency is alpha, the individual-specific overdispersion parameter is rho, the variational mean is mu, and the variational variance is sigma2.
grad_for_mu_sigma2

Gradient for obj_for_mu_sigma2 with respect for mu and sigma2.
log_sum_exp

Log-sum-exponential trick.
is.mupdog

Tests if its argument is a mupdog object.
obj_for_weighted_lbb

uni_em

EM algorithm to fit weighted ash objective.
pbetabinom_double

The distribution function of the betabinomial. This is generally only advisable if q is relatively small.
obj_for_weighted_lnorm

rgeno

qbetabinom_double

The quantile function of the beta-binomial distribution parameterized by mean and overdispersion parameter.
update_dr

Same as update_pp_f1 but I exclusively use the EM (instead of also Brent's method), and I allow for priors on the mixing proportions.
plot_geno

Make a genotype plot.
rflexdog

wem

update_pp_f1

Function to update the parameters in the preferential pairing F1 model.
pen_bias

Penalty on bias parameter.
rbetabinom_int

One draw from the beta-binomial distribution parameterized by mean and overdispersion parameter.
uni_em_const

EM algorithm to fit weighted ash objective with a uniform mixing component.
uni_obj_const

updog-package

updog Flexible Genotyping for Polyploids
pp_brent_obj

Objective function when doing Brent's method in update_pp_f1 when one parent only has two mixing components.
update_pp_s1

Same as update_pp_f1 but only allow s1.
uni_obj

xi_double

Adjusts allele dosage p by the sequencing error rate eps and the allele bias h.
update_R

Update the underlying correlation matrix.
xi_fun

Adjusts allele dosage p by the sequencing error rate eps and the allele bias h.