Learn R Programming

⚠️There's a newer version (2.1.5) of this package.Take me there.

updog

Updog provides a suite of methods for genotyping polyploids from next-generation sequencing (NGS) data. It does this while accounting for many common features of NGS data: allele bias, overdispersion, sequencing error, and (possibly) outlying observations. It is named updog for “Using Parental Data for Offspring Genotyping” because we originally developed the method for full-sib populations, but it works now for more general populations. The method is described in detail Gerard et. al. (2018) <doi:10.1534/genetics.118.301468>. Additional details concerning prior specification are described in Gerard and Ferrão (2019) <doi:10.1093/bioinformatics/btz852>.

The main function is flexdog(), which provides many options for the distribution of the genotypes in your sample. Novel genotype distributions include the class of proportional normal distributions (model = "norm") and the class of discrete unimodal distributions (model = "ash"). The default is model = "norm" because it is the most robust to varying genotype distributions, but feel free to use more specialized priors if you have more information on the data.

Also provided are:

  • An experimental function mupdog(), which allows for correlation between the individuals’ genotypes while jointly estimating the genotypes of the individuals at all provided SNPs. The implementation uses a variational approximation. This is designed for samples where the individuals share a complex relatedness structure (e.g. siblings, cousins, uncles, half-siblings, etc). Right now there are no guarantees about this function’s performance.
  • Functions to simulate genotypes (rgeno()) and read-counts (rflexdog()). These support all of the models available in flexdog().
  • Functions to evaluate oracle genotyping performance: oracle_joint(), oracle_mis(), oracle_mis_vec(), and oracle_cor(). We mean “oracle” in the sense that we assume that the entire data generation process is known (i.e. the genotype distribution, sequencing error rate, allele bias, and overdispersion are all known). These are good approximations when there are a lot of individuals (but not necessarily large read-depth).

The original updog package is now named updogAlpha and may be found here.

See also ebg, fitPoly, and TET, and polyRAD. Our best “competitor” is probably fitPoly, though polyRAD has some nice ideas for utilizing population structure and linkage disequilibrium.

See NEWS for the latest updates on the package.

Vignettes

I’ve included many vignettes in updog, which you can access online here.

Bug Reports

If you find a bug or want an enhancement, please submit an issue here.

Installation

You can install updog from CRAN in the usual way:

install.packages("updog")

You can install the current (unstable) version of updog from GitHub with:

# install.packages("devtools")
devtools::install_github("dcgerard/updog")

CVXR

If you want to use the use_cvxr = TRUE option in flexdog (not generally recommended), you will need to install the CVXR package. Before I could install CVXR in Ubuntu, I had to run in the terminal

sudo apt-get install libmpfr-dev

and then run in R

install.packages("Rmpfr")

How to Cite

Please cite

Gerard, D., Ferrão, L. F. V., Garcia, A. A. F., & Stephens, M. (2018). Genotyping Polyploids from Messy Sequencing Data. Genetics, 210(3), 789-807. doi: 10.1534/genetics.118.301468.

Or, using BibTex:

@article {gerard2018genotyping,
    author = {Gerard, David and Ferr{\~a}o, Lu{\'i}s Felipe Ventorim and Garcia, Antonio Augusto Franco and Stephens, Matthew},
    title = {Genotyping Polyploids from Messy Sequencing Data},
    volume = {210},
    number = {3},
    pages = {789--807},
    year = {2018},
    doi = {10.1534/genetics.118.301468},
    publisher = {Genetics},
    issn = {0016-6731},
    URL = {https://doi.org/10.1534/genetics.118.301468},
    journal = {Genetics}
}

If you are using the proportional normal prior class (model = "norm") or the unimodal prior class (model = "ash"), then please also cite

Gerard, D. & Ferrão L. F. V. (2019). “Priors for Genotyping Polyploids.” Bioinformatics (in press). doi: 10.1093/bioinformatics/btz852

Or, using BibTex:

@article{gerard2019priors,
    author = {Gerard, David and Ferr{\~a}o, Lu{\'i}s Felipe Ventorim},
    title = {Priors for Genotyping Polyploids},
    journal = {Bioinformatics},
    year = {2019},
    month = {11},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btz852},
    note = {btz852},
}

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Copy Link

Version

Install

install.packages('updog')

Monthly Downloads

415

Version

1.1.3

License

GPL-3

Maintainer

David Gerard

Last Published

November 21st, 2019

Functions in updog (1.1.3)

dbernbinom

Special case of betabinomial where the beta is bernoulli mu.
dbetabinom_double

The density function of the beta-binomial distribution.
compute_all_log_bb

Calculates the log-density for every individual by snp by dosage level.
dc_dtau

Derivative of \(c = (1 - \tau) / \tau\) with respect to \(\tau\).
dbetabinom_alpha_beta_double

Density function of betabinomial with the shape parameterizations
compute_all_post_prob

Computes every posterior probability for each dosage level for each individual at each SNP.
ashpen_fun

convolve_up

Convolution between two discrete probability mass functions with support on 0:K.
dlbeta_dtau

Derivative of the log-beta-binomial density with respect to the overdispersion parameter.
dlbeta_dxi

Derivative of the log-betabinomial density with respect to the mean of the underlying beta.
dpen_dh

Derivative of $$-log(h) - (log(h) - \mu_h)^2 / (2\sigma_h^2)$$ with respect to \(h\).
get_inner_weights

Compute inner weights for updating the mixing proportions when using ash model.
dxi_dh

Derivative of xi-function with respect to bias parameter.
dxi_df

Derivative of xi with respect to f.
get_hyper_weights

Return mixture weights needed to obtain a hypergeometric distribution.
eta_fun

Adjusts allele dosage p by the sequencing error rate eps.
doutdist

The outlier distribution we use. Right now it is just a beta binomial with mean 1/2 and od 1/3 (so underlying beta is just a uniform from 0 to 1).
expit

The expit (logistic) function.
dpen_deps

Derivative of $$-log(\epsilon(1 - \epsilon)) - (logit(\epsilon) - \mu_{\epsilon})^2 / (2\sigma_{\epsilon}^2)$$ with respect to \(\epsilon\).
grad_for_weighted_lnorm

get_conv_inner_weights

Get the inner weights used for the em update in update_pp_f1 when there are more than two bivalent components for one of the parents.
df_deps

Derivative of f with respect to eps.
get_dimname

Returns a vector character strings that are all of the possible combinations of the reference allele and the non-reference allele.
get_uni_rep

Get the representation of a discrete unimodal probability distribution.
obj_for_mu_sigma2

Objective function when updating mu and sigma2.
oracle_joint

The joint probability of the genotype and the genotype estimate of an oracle estimator.
obj_for_mu_sigma2_wrapper

Wrapper for obj_for_mu_sigma2 so that I can use it in optim.
get_wik_mat

obj_for_alpha

Objective function when updating alpha
grad_for_weighted_lbb

flexdog_full

Flexible genotyping for polyploids from next-generation sequencing data.
dr_pen

dlbeta_deps

Derivative of the log-beta-binomial density with respect to the sequencing error rate.
flexdog

Flexible genotyping for polyploids from next-generation sequencing data.
post_prob

Variational posterior probability of having dosage A alleles when the ploidy is ploidy, the allele frequency is alpha, the individual-specific overdispersion parameter is rho, the variational mean is mu, and the variational variance is sigma2.
update_R

Update the underlying correlation matrix.
flex_update_pivec

Update the distribution of genotypes from various models.
dlbeta_dh

Derivative of log-betabinomial density with respect to bias parameter.
oracle_mis

Calculate oracle misclassification error rate.
f1_obj

Objective for mixture of known dist and uniform dist.
initialize_pivec

dlbeta_dc

Derivative of the log-beta density with respect to c where \(c = (1 - \tau)/\tau\) where \(\tau\) is the overdispersion parameter.
get_q_array

Return the probabilities of an offspring's genotype given its parental genotypes for all possible combinations of parental and offspring genotypes. This is for species with polysomal inheritance and bivalent, non-preferential pairing.
get_probk_vec

Obtain the genotype distribution given the distribution of discrete uniforms.
elbo

The evidence lower bound
obj_for_eps

Objective function for updating sequencing error rate, bias, and overdispersion parameters.
uni_obj

obj_for_weighted_lnorm

rgeno

oracle_cor_from_joint

Calculate the correlation of the oracle estimator with the true genotype from the joint distribution matrix.
eta_double

Adjusts allele dosage p by the sequencing error rate eps.
xi_fun

Adjusts allele dosage p by the sequencing error rate eps and the allele bias h.
oracle_cor

Calculates the correlation between the true genotype and an oracle estimator.
pen_seq_error

Penalty on sequencing error rate.
pen_bias

Penalty on bias parameter.
get_wik_mat_out

E-step in flexdog where we now allow an outlier distribution.
is.flexdog

Tests if an argument is a flexdog object.
plot_geno

Make a genotype plot.
uni_em_const

EM algorithm to fit weighted ash objective with a uniform mixing component.
plot.mupdog

is.mupdog

Tests if its argument is a mupdog object.
oracle_mis_from_joint

Get the oracle misclassification error rate directly from the joint distribution of the genotype and the oracle estimator.
pp_brent_obj

Objective function when doing Brent's method in update_pp_f1 when one parent only has two mixing components.
grad_for_eps

log_sum_exp

Log-sum-exponential trick.
rflexdog

update_pp_s1

Same as update_pp_f1 but only allow s1.
log_sum_exp_2

Log-sum-exponential trick using just two doubles.
get_bivalent_probs

Returns segregation probabilities, pairing representation and number of ref alleles given the ploidy.
qbetabinom_double

The quantile function of the beta-binomial distribution parameterized by mean and overdispersion parameter.
get_bivalent_probs_dr

grad_for_mu_sigma2

Gradient for obj_for_mu_sigma2 with respect for mu and sigma2.
grad_for_mu_sigma2_wrapper

Gradient for obj_for_mu_sigma2_wrapper with respect for muSigma2 and a wrapper for grad_for_mu_sigma2
flexdog_obj

flexdog_obj_out

Log-likelihood that flexdog maximizes when outliers are present.
uitdewilligen

Subset of individuals and SNPs from Uitdewilligen et al (2013).
rbetabinom_int

One draw from the beta-binomial distribution parameterized by mean and overdispersion parameter.
wem

xi_double

Adjusts allele dosage p by the sequencing error rate eps and the allele bias h.
logit

The logit function.
mupout

update_dr

Same as update_pp_f1 but I exclusively use the EM (instead of also Brent's method), and I allow for priors on the mixing proportions.
oracle_plot

mupdog

Multi-SNP updog.
updog-package

updog Flexible Genotyping for Polyploids
obj_for_weighted_lbb

update_pp_f1

Function to update the parameters in the preferential pairing F1 model.
obj_for_rho

Objective function when updating a single inbreeding coefficient.
pbetabinom_double

The distribution function of the betabinomial. This is generally only advisable if q is relatively small.
snpdat

GBS data from Shirasawa et al (2017)
summary.mupdog

oracle_mis_vec

Returns the oracle misclassification rates for each genotype.
plot.flexdog

oracle_mis_vec_from_joint

Get the oracle misclassification error rates (conditional on true genotype) directly from the joint distribution of the genotype and the oracle estimator.
pivec_from_segmats

Function to get the segregation probabilities from the distributions of each component and the weights of each component.
uni_em

EM algorithm to fit weighted ash objective.
uni_obj_const

compute_all_phifk

Computes $$\Phi^{-1}(F(k|K,\alpha_j,\rho_i))$$ for all possible (i,j,k).
dbetabinom

The Beta-Binomial Distribution