WGR2 (EM): Expectation-Maximization WGR

Description

Univariate models to find breeding values through regression fitted via expectation-maximization implemented in C++.

Usage

emRR(y, gen, df = 10, R2 = 0.5)
emBA(y, gen, df = 10, R2 = 0.5)
emBB(y, gen, df = 10, R2 = 0.5, Pi = 0.75)
emBC(y, gen, df = 10, R2 = 0.5, Pi = 0.75)
emBCpi(y, gen, df = 10, R2 = 0.5, Pi = 0.75)
emBL(y, gen, R2 = 0.5, alpha = 0.02)
emEN(y, gen, R2 = 0.5, alpha = 0.02)
emDE(y, gen, R2 = 0.5)
emML(y, gen, D = NULL)
lasso(y, gen)
emCV(y, gen, k = 5, n = 5, Pi = 0.75, alpha = 0.02,
     df = 10, R2 = 0.5, avg=TRUE, llo=NULL, tbv=NULL, ReturnGebv = FALSE)

Value

The EM functions returns a list with the intercept ($mu$), the regression coefficient ($b$), the fitted value ($hat$), and the estimated intraclass-correlation ($h2$).

The function emCV returns the predictive ability of each model, that is, the correlation between the predicted and observed values from $k$-fold cross-validations repeated $n$ times.

Arguments

y: Numeric vector of response variable ($n$). NA is not allowed.
gen: Numeric matrix containing the genotypic data. A matrix with $n$ rows of observations and $m$ columns of molecular markers.
df: Hyperprior degrees of freedom of variance components.
R2: Expected R2, used to calculate the prior shape (de los Campos et al. 2013).
Pi: Value between 0 and 1. Expected probability pi of having null effect (or 1-Pi if Pi>0.5).
alpha: Value between 0 and 1. Intensity of L1 variable selection.
D: NULL or numeric vector with length p. Vector of weights for markers.
k: Integer. Folding of a k-fold cross-validation.
n: Integer. Number of cross-validation to perform.
avg: Logical. Return average across CV, or correlations within CV.
llo: NULL or a vector (numeric or factor) with the same length as y. If provided, the cross-validations are performed as Leave a Level Out (LLO). This argument allows the user to predefine the splits. This argument overrides k and n.
tbv: NULL or numeric vector of 'true breeding values' ($n$) to use to compare cross-validations to. If NULL, the cross-validations will have the phenotypes as prediction target.
ReturnGebv: Logical. If TRUE, it returns a list with the average marker values and fitted values across all cross-validations, in addition to the regular output.

Author

Alencar Xavier

Details

The model for the whole-genome regression is as follows:

$$y = mu + Xb + e$$

where $y$ is the response variable, $mu$ is the intercept, $X$ is the genotypic matrix, $b$ is the effect of an allele substitution (or regression coefficient) and $e$ is the residual term. A k-fold cross-validation for model evaluation is provided by $emCV$.

Examples

Run this code

     if (FALSE) {

data(tpod)
emCV(y,gen,3,3)
          
 }

Run the code above in your browser using DataLab