Learn R Programming

cpgen (version 0.1)

cSSBR: Single Step Bayesian Regression

Description

This function runs Single Step Bayesian Regression (SSBR) for the prediction of breeding values in a unified model that incorporates genotyped and non genotyped individuals (Fernando et al., 2014).

Usage

cSSBR(data, M, M.id, X=NULL, par_random=NULL, scale_e=0, df_e=0, 
      niter=5000, burnin=2500, seed=NULL, verbose=TRUE)

Arguments

data
data.frame with four columns: id, sire, dam, y
M
Marker Matrix for genotyped individuals
M.id
Vector of length nrow(M) representing rownames for M
X
Fixed effects design matrix of type: matrix or dgCMatrix. If omitted a column-vector of ones will be assigned. Must have as many rows as data
par_random
as in clmm
niter
as in clmm
burnin
as in clmm
verbose
as in clmm
scale_e
as in clmm
df_e
as in clmm
seed
as in clmm

Value

  • List of 4 + number of random effects as in clmm +
  • SSBRList of 7:
    • ids- ids used in the model (ordered as in other model terms)
    y - phenotype vector X - Design matrix for fixed effects Marker_Matrix - Combined Marker Matrix including imputed and genotyped individuals Z_residual - Design Matrix used to model the residual error for the imputed individuals ginverse_residual - Submatrix of the inverse of the numerator relationship matrix. Used to model the residual error for the imputed individuals Breeding_Values - Predicted Breeding Values for all animals in data that have genotypes and/or phenotypes

Details

The function sets up the following model using cSSBR.setup: $$\mathbf{y} = \mathbf{Xb} + \mathbf{M\alpha} + \mathbf{Z\epsilon} + \mathbf{e}$$ The matrix $\mathbf{M}$ denotes a combined marker matrix consisting of actual and imputed marker covariates. Best linear predictions of gene content (Gengler et al., 2007) for the non-genotyped individuals are obtained using: $\mathbf{A}^{11}\hat{\mathbf{M}_1} = -\mathbf{A}^{12}\mathbf{M}_2$ (Fernando et al., 2014). $\mathbf{A}^{11}$ and $\mathbf{A}^{12}$ are submatrices of the inverse of the numerator relationship matrix, which is easily obtained (Henderson, 1976). The subscripts 1 and 2 denote non genotyped and genotyped individuals respectively. The very sparse equation system is being solved using a sparse cholesky solver provided by the Eigen library. The residual imputation error has variance: $(\mathbf{A}^{11})^{-1}\sigma_{\epsilon}^2$.

References

Fernando, R.L., Dekkers, J.C., Garrick, D.J.: A class of bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses. Genetics Selection Evolution 46(1), 50 (2014)

Gengler, N., Mayeres, P., Szydlowski, M.: A simple method to approximate gene content in large pedigree populations: application to the myostatin gene in dual-purpose belgian blue cattle. animal 1(01), 21 (2007)

Henderson, C.R.: A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values. Biometrics 32(1), 69-83 (1976)

See Also

cSSBR.setup, clmm

Examples

Run this code
# example dataset

id <- 1:6
sire <- c(rep(NA,3),rep(1,3))
dam <- c(rep(NA,3),2,2,3)

# phenotypes
y <- c(NA, 0.45, 0.87, 1.26, 1.03, 0.67)

dat <- data.frame(id=id,sire=sire,dam=dam,y=y)


# Marker genotypes
M <- rbind(c(1,2,1,1,0,0,1,2,1,0),
           c(2,1,1,1,2,0,1,1,1,1),
           c(0,1,0,0,2,1,2,1,1,1))

M.id <- 1:3

var_y <- var(y,na.rm=TRUE)
var_e <- (10*var_y / 21)
var_a <- var_e 
var_m <- var_e / 10

# put emphasis on the prior
df = 500

par_random=list(list(method="ridge",scale=var_m,df = df),list(method="ridge",scale=var_a,df=df))

set_num_threads(1)
mod<-cSSBR(data = dat,
           M=M,
           M.id=M.id,
           par_random=par_random,
           scale_e = var_e,
           df_e=df,
           niter=50000,
           burnin=30000)

# check marker effects
print(round(mod[[4]]$posterior$estimates_mean,digits=2))

# check breeding value prediction:
print(round(mod$SSBR$Breeding_Values,digits=2))

Run the code above in your browser using DataLab